Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinearAlgebra: use band index in structured broadcast #54075

Merged
merged 2 commits into from
May 26, 2024

Conversation

jishnub
Copy link
Contributor

@jishnub jishnub commented Apr 14, 2024

This adds a new BandIndex type internal to LinearAlgebra that parallels a CartesianIndex, and stores a band index and a linear index of an element along that band. If the index of the band is a compile-time constant (which is often the case with Diagonal/Bidiagonal/Tridiagonal matrices), constant-propagation may eliminate branches in indexing into the matrix, and directly forward the indexing to the corresponding diagonal. This is particularly important in broadcasting for these matrices, which acts band-wise.

An example of an improvement in performance with this PR:

julia> using LinearAlgebra

julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix

julia> @btime $T .+ $T;
  5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337"
  2.872 ms (10 allocations: 20.46 MiB) # This PR

julia> @btime $T + $T; # reference
  2.885 ms (10 allocations: 20.46 MiB)

This makes the broadcast operation as fast as the sum, where the latter adds the diagonals directly.

I'm not 100% certain why this works as well as it does, as the constant band index may get lost in the newindex computation. I suspect branch prediction somehow works around this and preserves the constant.

@jishnub jishnub added linear algebra Linear algebra arrays [a, r, r, a, y, s] labels Apr 14, 2024
@jishnub jishnub changed the title LinearAlgebra: use BandIndex in structured broadcast LinearAlgebra: use band index in structured broadcast Apr 14, 2024
@jishnub jishnub requested a review from dkarrasch April 23, 2024 09:20
@jishnub
Copy link
Contributor Author

jishnub commented May 7, 2024

Gentle bump

1 similar comment
@jishnub
Copy link
Contributor Author

jishnub commented May 19, 2024

Gentle bump

@jonas-schulze
Copy link
Contributor

If you make it to JuliaCon, it would IMO be worth raising in the survey or elsewhere that so many PRs like this one stall for no apparent reason.

@dkarrasch
Copy link
Member

This looks good to me. I'm curious about a couple of things?

  1. What's the runtime impact on, say, addition of ::Diagonal + ::Tridiagonal?
  2. Does this have any latency/TTFX effects?
  3. Do we have specialized addition/subtraction/(other?) methods for structured matrices, that we can remove, because ::AbstractMatrix + ::AbstractMatrix has a fallback via broadcasting?

I think this is worth a NEWS item, so maybe package maintainers can pick this up and do something nice with it.

@jishnub
Copy link
Contributor Author

jishnub commented May 20, 2024

Runtimes improve in general:

julia> using LinearAlgebra

julia> D = Diagonal(rand(1000)); T = Tridiagonal(rand(999), rand(1000), rand(999)); B = Bidiagonal(rand(1000), rand(999), :U);

julia> @btime $D .+ $B;
  8.201 μs (6 allocations: 15.75 KiB) # nightly v"1.12.0-DEV.558"
  2.775 μs (6 allocations: 15.75 KiB) # this PR

julia> @btime $D .+ $T;
  4.904 μs (10 allocations: 23.67 KiB) # nightly
  2.718 μs (10 allocations: 23.67 KiB) # This PR

julia> @btime $B .+ $T;
  10.841 μs (10 allocations: 23.67 KiB) # nigthly
  4.134 μs (10 allocations: 23.67 KiB) # This PR

Latency does increase in certain cases compared to v"1.12.0-DEV.560", although it improves in others:

julia> @time D .+ D; # no impact, as there is only one band
  0.110434 seconds (159.40 k allocations: 8.525 MiB, 99.92% compilation time) # nightly
  0.112720 seconds (173.04 k allocations: 9.321 MiB, 99.92% compilation time) # this PR

julia> @time B .+ B;
  0.567609 seconds (578.01 k allocations: 30.555 MiB, 29.94% gc time, 99.98% compilation time) # nightly
  0.457304 seconds (678.50 k allocations: 36.486 MiB, 99.97% compilation time) # this PR

julia> @time T .+ T;
  0.320546 seconds (192.58 k allocations: 10.372 MiB, 53.73% gc time, 99.97% compilation time) # nightly
  0.210033 seconds (271.85 k allocations: 15.096 MiB, 99.95% compilation time) # this PR

julia> @time D .+ B;
  0.290067 seconds (573.75 k allocations: 30.220 MiB, 99.96% compilation time) # nightly
  0.371920 seconds (699.10 k allocations: 37.395 MiB, 99.96% compilation time) # This PR

julia> @time D .+ T;
  0.324574 seconds (212.92 k allocations: 11.460 MiB, 52.94% gc time, 99.97% compilation time) # nightly
  0.202575 seconds (297.82 k allocations: 16.446 MiB, 99.94% compilation time) # This PR

julia> @time B .+ T;
  0.348824 seconds (228.11 k allocations: 12.224 MiB, 49.64% gc time, 99.96% compilation time) # nightly
  0.203640 seconds (316.88 k allocations: 17.549 MiB, 99.93% compilation time) # This PR

Often, there appears to be a significant GC time on nightly that is absent in this PR. I'm not quite certain why this is the case, and why there isn't this GC overhead in some cases. Taking the GC time out, it would appear that this PR increases latency somewhat in many of the broadcasting operations (although the GC time on nightly appears to persist across runs, and is fairly consistent).

Regarding removing methods, this is a bit tricky, as there may be array types that specialize addition/subtraction but not broadcasting, and they may not be compatible with the fallback that uses scalar indexing. Perhaps this is an over-cautious approach, but let's leave these in for now.

Re. the NEWS entry, are you thinking of making the BandIndex type public? I got the idea from BandedMatrices which implements a somewhat similar interface, and could potentially collaborate with this. E.g. indexing into a Band may produce a BandIndex, so that one may use indexing operations like A[Band(1)[3]] to access the third element along the band. There is an outstanding issue about this (JuliaLinearAlgebra/BandedMatrices.jl#223), although that is lager is scope.

@dkarrasch
Copy link
Member

Regarding removing methods: I checked and found, e.g., this method:

@commutative function (+)(A::Bidiagonal, B::Diagonal)

What would happen, if we removed this and the similar below this? This is completely restricted to LinearAlgebra types. If broadcasting knew how to determine the target type, and otherwise used broadcasting and copyto! via indexing into bands, this would be safe to remove, wouldn't it?

@jishnub
Copy link
Contributor Author

jishnub commented May 20, 2024

Currently, InfiniteArrays defines BroadcastStyle(::(Type{<:Diagonal{<:Any,<:AbstractInfUnitRange}}) and similar methods, so this would return a LazyArrays.BroadcastMatrix instead of a Bidiagonal if we remove this method. While the values are identical, I suspect this would not recognize the structure anymore. Perhaps InfiniteArrays shouldn't be defining this, but it'll require some concerted effort to work around these cases.

@jishnub jishnub merged commit b93cd43 into master May 26, 2024
7 checks passed
@jishnub jishnub deleted the jishnub/bandindexing branch May 26, 2024 13:52
DilumAluthge pushed a commit that referenced this pull request Jun 3, 2024
This adds a new `BandIndex` type internal to `LinearAlgebra` that
parallels a `CartesianIndex`, and stores a band index and a linear index
of an element along that band. If the index of the band is a
compile-time constant (which is often the case with
`Diagonal`/`Bidiagonal`/`Tridiagonal` matrices), constant-propagation
may eliminate branches in indexing into the matrix, and directly forward
the indexing to the corresponding diagonal. This is particularly
important in broadcasting for these matrices, which acts band-wise.

An example of an improvement in performance with this PR:
```julia
julia> using LinearAlgebra

julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix

julia> @Btime $T .+ $T;
  5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337"
  2.872 ms (10 allocations: 20.46 MiB) # This PR

julia> @Btime $T + $T; # reference
  2.885 ms (10 allocations: 20.46 MiB)
```
This makes the broadcast operation as fast as the sum, where the latter
adds the diagonals directly.

I'm not 100% certain why this works as well as it does, as the constant
band index may get lost in the `newindex` computation. I suspect branch
prediction somehow works around this and preserves the constant.
lazarusA pushed a commit to lazarusA/julia that referenced this pull request Jul 12, 2024
This adds a new `BandIndex` type internal to `LinearAlgebra` that
parallels a `CartesianIndex`, and stores a band index and a linear index
of an element along that band. If the index of the band is a
compile-time constant (which is often the case with
`Diagonal`/`Bidiagonal`/`Tridiagonal` matrices), constant-propagation
may eliminate branches in indexing into the matrix, and directly forward
the indexing to the corresponding diagonal. This is particularly
important in broadcasting for these matrices, which acts band-wise.

An example of an improvement in performance with this PR:
```julia
julia> using LinearAlgebra

julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix

julia> @Btime $T .+ $T;
  5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337"
  2.872 ms (10 allocations: 20.46 MiB) # This PR

julia> @Btime $T + $T; # reference
  2.885 ms (10 allocations: 20.46 MiB)
```
This makes the broadcast operation as fast as the sum, where the latter
adds the diagonals directly.

I'm not 100% certain why this works as well as it does, as the constant
band index may get lost in the `newindex` computation. I suspect branch
prediction somehow works around this and preserves the constant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants