-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LinearAlgebra: use band index in structured broadcast #54075
Conversation
BandIndex
in structured broadcastd6a8aea
to
b4e4adc
Compare
Gentle bump |
1 similar comment
Gentle bump |
If you make it to JuliaCon, it would IMO be worth raising in the survey or elsewhere that so many PRs like this one stall for no apparent reason. |
This looks good to me. I'm curious about a couple of things?
I think this is worth a NEWS item, so maybe package maintainers can pick this up and do something nice with it. |
d47fc73
to
a4b8e17
Compare
Runtimes improve in general: julia> using LinearAlgebra
julia> D = Diagonal(rand(1000)); T = Tridiagonal(rand(999), rand(1000), rand(999)); B = Bidiagonal(rand(1000), rand(999), :U);
julia> @btime $D .+ $B;
8.201 μs (6 allocations: 15.75 KiB) # nightly v"1.12.0-DEV.558"
2.775 μs (6 allocations: 15.75 KiB) # this PR
julia> @btime $D .+ $T;
4.904 μs (10 allocations: 23.67 KiB) # nightly
2.718 μs (10 allocations: 23.67 KiB) # This PR
julia> @btime $B .+ $T;
10.841 μs (10 allocations: 23.67 KiB) # nigthly
4.134 μs (10 allocations: 23.67 KiB) # This PR Latency does increase in certain cases compared to julia> @time D .+ D; # no impact, as there is only one band
0.110434 seconds (159.40 k allocations: 8.525 MiB, 99.92% compilation time) # nightly
0.112720 seconds (173.04 k allocations: 9.321 MiB, 99.92% compilation time) # this PR
julia> @time B .+ B;
0.567609 seconds (578.01 k allocations: 30.555 MiB, 29.94% gc time, 99.98% compilation time) # nightly
0.457304 seconds (678.50 k allocations: 36.486 MiB, 99.97% compilation time) # this PR
julia> @time T .+ T;
0.320546 seconds (192.58 k allocations: 10.372 MiB, 53.73% gc time, 99.97% compilation time) # nightly
0.210033 seconds (271.85 k allocations: 15.096 MiB, 99.95% compilation time) # this PR
julia> @time D .+ B;
0.290067 seconds (573.75 k allocations: 30.220 MiB, 99.96% compilation time) # nightly
0.371920 seconds (699.10 k allocations: 37.395 MiB, 99.96% compilation time) # This PR
julia> @time D .+ T;
0.324574 seconds (212.92 k allocations: 11.460 MiB, 52.94% gc time, 99.97% compilation time) # nightly
0.202575 seconds (297.82 k allocations: 16.446 MiB, 99.94% compilation time) # This PR
julia> @time B .+ T;
0.348824 seconds (228.11 k allocations: 12.224 MiB, 49.64% gc time, 99.96% compilation time) # nightly
0.203640 seconds (316.88 k allocations: 17.549 MiB, 99.93% compilation time) # This PR Often, there appears to be a significant GC time on nightly that is absent in this PR. I'm not quite certain why this is the case, and why there isn't this GC overhead in some cases. Taking the GC time out, it would appear that this PR increases latency somewhat in many of the broadcasting operations (although the GC time on nightly appears to persist across runs, and is fairly consistent). Regarding removing methods, this is a bit tricky, as there may be array types that specialize addition/subtraction but not broadcasting, and they may not be compatible with the fallback that uses scalar indexing. Perhaps this is an over-cautious approach, but let's leave these in for now. Re. the NEWS entry, are you thinking of making the |
Regarding removing methods: I checked and found, e.g., this method: @commutative function (+)(A::Bidiagonal, B::Diagonal) What would happen, if we removed this and the similar below this? This is completely restricted to |
Currently, |
a4b8e17
to
93c6129
Compare
This adds a new `BandIndex` type internal to `LinearAlgebra` that parallels a `CartesianIndex`, and stores a band index and a linear index of an element along that band. If the index of the band is a compile-time constant (which is often the case with `Diagonal`/`Bidiagonal`/`Tridiagonal` matrices), constant-propagation may eliminate branches in indexing into the matrix, and directly forward the indexing to the corresponding diagonal. This is particularly important in broadcasting for these matrices, which acts band-wise. An example of an improvement in performance with this PR: ```julia julia> using LinearAlgebra julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix julia> @Btime $T .+ $T; 5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337" 2.872 ms (10 allocations: 20.46 MiB) # This PR julia> @Btime $T + $T; # reference 2.885 ms (10 allocations: 20.46 MiB) ``` This makes the broadcast operation as fast as the sum, where the latter adds the diagonals directly. I'm not 100% certain why this works as well as it does, as the constant band index may get lost in the `newindex` computation. I suspect branch prediction somehow works around this and preserves the constant.
This adds a new `BandIndex` type internal to `LinearAlgebra` that parallels a `CartesianIndex`, and stores a band index and a linear index of an element along that band. If the index of the band is a compile-time constant (which is often the case with `Diagonal`/`Bidiagonal`/`Tridiagonal` matrices), constant-propagation may eliminate branches in indexing into the matrix, and directly forward the indexing to the corresponding diagonal. This is particularly important in broadcasting for these matrices, which acts band-wise. An example of an improvement in performance with this PR: ```julia julia> using LinearAlgebra julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix julia> @Btime $T .+ $T; 5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337" 2.872 ms (10 allocations: 20.46 MiB) # This PR julia> @Btime $T + $T; # reference 2.885 ms (10 allocations: 20.46 MiB) ``` This makes the broadcast operation as fast as the sum, where the latter adds the diagonals directly. I'm not 100% certain why this works as well as it does, as the constant band index may get lost in the `newindex` computation. I suspect branch prediction somehow works around this and preserves the constant.
This adds a new
BandIndex
type internal toLinearAlgebra
that parallels aCartesianIndex
, and stores a band index and a linear index of an element along that band. If the index of the band is a compile-time constant (which is often the case withDiagonal
/Bidiagonal
/Tridiagonal
matrices), constant-propagation may eliminate branches in indexing into the matrix, and directly forward the indexing to the corresponding diagonal. This is particularly important in broadcasting for these matrices, which acts band-wise.An example of an improvement in performance with this PR:
This makes the broadcast operation as fast as the sum, where the latter adds the diagonals directly.
I'm not 100% certain why this works as well as it does, as the constant band index may get lost in the
newindex
computation. I suspect branch prediction somehow works around this and preserves the constant.