Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vector times matrix faster #1937

Merged
merged 1 commit into from
Nov 21, 2024
Merged

Make vector times matrix faster #1937

merged 1 commit into from
Nov 21, 2024

Conversation

thofma
Copy link
Member

@thofma thofma commented Oct 31, 2024

@fieker I is working well for vector * matrix, but I am having trouble in the other case. I guess he does not like my

bb.rows = reinterpret(Ptr{Ptr{ZZRingElem}}, pointer([pointer(be) + i for i in 0:length(b) - 1]))

but I am not sure what I am doing wrong.

julia> A = matrix(ZZ, 2, 2, [1, 2, 3, 4]); b = ZZRingElem[10, 0];

julia> A * b
2-element Vector{ZZRingElem}:
 7690
 0

@fieker
Copy link
Contributor

fieker commented Oct 31, 2024 via email

@thofma
Copy link
Member Author

thofma commented Nov 1, 2024

ah right, thanks

@lgoettgens
Copy link
Collaborator

I am surprised that this is faster than just calling flint in these cases. Can you (once it works) post some benchmarks here?

@thofma
Copy link
Member Author

thofma commented Nov 1, 2024

It is still calling flint, but eventually fmpz_mat_mul and not fmpz_mat_mul_fmpz_vec_ptr anymore.

@thofma
Copy link
Member Author

thofma commented Nov 2, 2024

Timings for $b \cdot A$, old versus new with $n \times n$ matrices and entries of size $2^B$. The bigger the number, the faster the new method. (The rows correspond to $n$.)

n \ B 5 10 25 50 100 200 500 1000
5 0.843564 0.851028 0.844431 2.0885 1.3113 0.845667 0.979706 0.97876
10 1.82766 1.7989 1.74988 4.35319 2.11531 1.16725 1.04208 1.01656
25 2.65245 2.67014 2.70464 8.41888 3.09119 1.35045 1.07404 1.05941
50 2.96443 3.02035 2.96937 9.97713 3.35594 1.33684 1.10265 1.0275
100 2.84638 2.90909 2.905 10.4487 3.4957 1.23185 1.10099 1.03541
200 2.17148 2.69597 2.63228 8.77398 3.05716 1.20955 1.10312 1.0478
500 1.15165 1.07107 1.06398 4.81586 2.81056 0.991455 1.34582 1.02989
1000 1.125 0.989656 0.993372 4.69268 3.41699 1.18048 1.19918 1.01886

Here is the same for the new version versus fmpz_mat_mul, if one takes $A$ and $b$ to be fmpz_mat (larger means that fmpz_mat_mul is faster):

n \ B 5 10 25 50 100 200 500 1000
5 2.16462 2.32268 2.23238 2.17938 1.41913 1.19593 1.06215 1.02352
10 1.48754 1.47796 1.46382 1.39716 1.13425 1.06692 1.01278 1.0
25 1.14498 1.18499 1.13666 1.1293 1.0352 1.01304 1.00559 1.00327
50 1.04787 1.04418 1.00511 1.04592 1.00784 1.00937 0.995975 1.00301
100 1.04294 1.0394 1.01154 1.01961 0.972145 1.01542 1.01409 1.00043
200 0.927182 0.747128 1.01613 1.16729 1.00899 1.00981 0.993286 1.01096
500 0.985183 0.991277 1.01906 0.99012 1.03796 1.03933 1.0104 1.01454
1000 0.999137 1.00473 1.00524 0.995211 0.99897 1.01552 1.02201 0.996246

Here the same for $A \cdot b$:

n \ B 5 10 25 50 100 200 500 1000
5 0.713331 0.709169 0.717162 1.78398 1.14858 0.832763 0.964895 0.970126
10 1.49522 1.48134 1.41823 3.90519 1.91063 1.11299 1.03962 0.991704
25 2.24352 2.4983 2.52025 7.38161 2.80567 1.23921 1.07674 1.03358
50 2.86203 2.75761 2.84369 9.62656 3.05153 1.16304 1.07525 1.02103
100 2.86225 2.82905 2.72735 10.25 2.98033 1.10712 1.07306 1.03141
200 2.14486 2.16051 1.8628 9.48113 3.21884 1.17413 1.08709 1.04623
500 1.04457 0.997934 1.00282 4.26627 1.66195 1.04183 0.950636 0.978479
1000 1.01089 0.928815 0.929939 4.42736 3.82359 1.30189 1.09177 0.983552
n \ B 5 10 25 50 100 200 500 1000
5 2.00944 2.06203 2.01763 2.05396 1.39988 1.18995 1.06161 1.02158
10 1.43761 1.41539 1.40961 1.39092 1.1386 1.05608 1.01143 1.00837
25 1.14498 1.15764 1.14169 1.14774 1.0663 1.01935 1.01487 0.990403
50 1.03928 1.072 1.05319 1.03687 1.02635 1.00876 1.01412 1.00174
100 1.019 1.01891 1.05361 1.03803 1.01385 1.00785 1.00676 1.00027
200 1.15244 1.11367 1.30248 0.778307 1.00798 1.00869 1.00363 1.01093
500 0.986566 1.00716 1.00155 1.01458 1.00353 0.996473 1.08001 1.00105
1000 1.00048 1.00431 1.00267 1.00689 1.00857 1.17954 1.02424 1.00669

So we might have to do some tuning for small dimensions.

Copy link

codecov bot commented Nov 2, 2024

Codecov Report

Attention: Patch coverage is 25.53191% with 35 lines in your changes missing coverage. Please review.

Project coverage is 87.92%. Comparing base (80d1172) to head (61a40dd).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/flint/fmpz_mat.jl 25.53% 35 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1937      +/-   ##
==========================================
- Coverage   88.00%   87.92%   -0.09%     
==========================================
  Files          99       99              
  Lines       36361    36402      +41     
==========================================
+ Hits        31999    32005       +6     
- Misses       4362     4397      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@@ -1784,13 +1784,69 @@ end
addmul!(z::ZZMatrixOrPtr, a::ZZMatrixOrPtr, b::Integer) = addmul!(z, a, flintify(b))
addmul!(z::ZZMatrixOrPtr, a::IntegerUnionOrPtr, b::ZZMatrixOrPtr) = addmul!(z, b, a)

function _very_unsafe_convert(::Type{ZZMatrix}, a::Vector{ZZRingElem}, row = true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _very_unsafe_convert(::Type{ZZMatrix}, a::Vector{ZZRingElem}, row = true)
function _very_unsafe_convert(::Type{ZZMatrix}, a::Vector{ZZRingElem}, Val{row} = Val(true)) where {row}

this should push performance ever so slightly further as it eliminates a jump in if row

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is measurable

@fingolfin
Copy link
Member

@thofma I am unsure how to read your tables. What are the floating point numbers in it: times? ratios?

Since you write "The bigger the number, the faster the new method." it sounds like perhaps "old time divided by new time", i.e. "by which factor did we get faster"? And then a value below 1 means the new method is slower?

If that guess is right I think I understand the first table.

For the second table you write

Here is the same for new versions fmpz_mat_mul

What does that mean? What is being compared here exactly?

@thofma
Copy link
Member Author

thofma commented Nov 8, 2024

@thofma I am unsure how to read your tables. What are the floating point numbers in it: times? ratios?

Since you write "The bigger the number, the faster the new method." it sounds like perhaps "old time divided by new time", i.e. "by which factor did we get faster"? And then a value below 1 means the new method is slower?

If that guess is right I think I understand the first table.

The guess is correct.

Here is the same for new versions fmpz_mat_mul

What does that mean? What is being compared here exactly?

Hopefully clarified it to

Here is the same for the new version versus fmpz_mat_mul, if one takes $A$ and $b$ to be fmpz_mat

@thofma thofma marked this pull request as ready for review November 19, 2024 18:20
@thofma
Copy link
Member Author

thofma commented Nov 19, 2024

I added some cutoff when the new method is used.

Maybe @fieker can have quick artificial glance?

@lgoettgens
Copy link
Collaborator

@thofma could you move the mul!_flint functions either to before or after the two mul! functions, but not in-between them?

@thofma
Copy link
Member Author

thofma commented Nov 20, 2024

done

@lgoettgens
Copy link
Collaborator

Another maybe stupid question: wouldn't it be even faster to just do a similar conversion from vectors to matrices inside of flint? Aka leave everything in Nemo the same and just move everything from here to flint? (I know that this needs c code and stuff, but could be considered before you do this here for all of the matrix types)

@fieker
Copy link
Contributor

fieker commented Nov 21, 2024 via email

@thofma
Copy link
Member Author

thofma commented Nov 21, 2024

Sure, anyone should feel free to implement this directly in C. But unless this happens this week and there is a new flint release this week, I will go move with the approach here.

@fieker are you happy with the changes here?

@fieker fieker merged commit 73e3de8 into master Nov 21, 2024
22 of 24 checks passed
@fieker fieker deleted the th/dirty branch November 21, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants