Specialize on diagonal fieldvector broadcasts #1615
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR specializes on "diagonal" broadcast expressions with fieldvectors. Specifically, if a fieldvector broadcast expression (one with a
FieldVectorStyle
) has one and only one fieldvector type, then we don't check the broadcast axes (the combined axes is the same). The benefit of skipping this code path is that checking the axes is type-unstable (JuliaArrays/BlockArrays.jl#310). From what I can tell, there is not a simple path to fixing JuliaArrays/BlockArrays.jl#310 because:Tuple
s, so that this is stack-allocatedTuple
s is type unstable because the result depends on the values in the tuples.So, I think that the simplest solution is to specialize when we know that the axes will not change.
This PR adds a test to show that
is_diagonal_bc
will return true/false when expected at compile time (@code_typed is_diagonal_bc(bc)
returnstrue
orfalse
), and tests that (what was breaking) fieldvector broadcast expressions with multiple fields does not allocate.Closes #1465.
One other benefit of this solution is that we can at some point reuse
is_diagonal_bc
to specialize on these broadcast expressions incopyto!
, so that we perform a single cuda kernel launch (as opposed to however many fields are in the fieldvector).The only thing remaining is that I'd like to run ClimaAtmos CI on this branch to make sure that this doesn't blow up compile-times. Update: good news, it looks fine xref: CliMA/ClimaAtmos.jl#2700 (comment)