Recommended way to write qfunctions? #1660

lindsayad · 2024-09-10T18:04:31Z

lindsayad
Sep 10, 2024

The design in MOOSE is that each "qfunction" corresponds to one term of a weak form, e.g. we compose an advection-diffusion problem with separate advection and diffusion qfunctions. This makes it very easy for users to compose different physics problems piece-by-piece. Does this design still make sense using libCEED qfunctions? I'm thinking this would correspond to a CeedOperatorApply and then a following CeedOperatorApplyAdd?

rezgarshakeri · 2024-09-10T18:17:16Z

rezgarshakeri
Sep 10, 2024
Collaborator

The design in MOOSE is that each "qfunction" corresponds to one term of a weak form, e.g. we compose an advection-diffusion problem with separate advection and diffusion qfunctions. This makes it very easy for users to compose different physics problems piece-by-piece. Does this design still make sense using libCEED qfunctions? I'm thinking this would correspond to a CeedOperatorApply and then a following CeedOperatorApplyAdd?

Yes we have separate Qfunctions, for example in this file in Ratel we create stress and other terms but forcing term for MMS is written here.

Usually we group all terms that multiply by test function and its gradient as described in the documentation. (we call them f0, f1, g0, g1)

2 replies

knepley Sep 10, 2024
Collaborator

However, my understanding is that you would make a wrapper Q-function statically that called all subfunctions in the group (that is what we did in PyLith). You cannot dispatch by pointer when on device, so everything needs to be there statically when the program is compiled, or you have to do code generation. This is a part of GPU design which I strongly dislike, but we all have to live with.

rezgarshakeri Sep 10, 2024
Collaborator

Yeah, we have that wrapper Q-function in Ratel to make the implementation easier. User only needs to provide f0, f1 (and g0, g1 for mixed) and their linearization for adding new material (Neo-Hookean, ...) in Ratel.

jrwrigh · 2024-09-10T18:30:53Z

jrwrigh
Sep 10, 2024
Collaborator

Another possible way to do it (depending on how into the weeds MOOSE users want to go) is to use CEED_QFUNCTION_HELPER functions for the individual components and the write the combined QFunction from there. That would require automatic code generation to work correctly if the users aren't supposed to construct the QFunctions themselves. This is the most performant route.

If code gen is out of the question, then can create composite CeedOperators with each piece of the weak form instead; So one QFunction per piece, one CeedOperator per piece, and then glue them with CeedCompositeOperatorAddSub. This will be more efficient than doing successive CeedOperatorApply{Add} as the element restriction would only be done once for vectors with the same element restriction.

2 replies

jrwrigh Sep 10, 2024
Collaborator

Another way to do the CEED_QFUNCTION_HELPER route is to create a CEED_QFUNCTION for every combination of weak form "piece". We do something similar in HONEE, for example our freestream boundary conditions. We have a separate CEED_QFUNCTION per state_variable-Riemann_solver pair. This way the branches in the freestream function are eliminated in compilation, but without having to duplicate code.

I assume this is untenable for your use case, but figured I'd mention just in case there's some way it could be applicable (even if it's not for this circumstance in particular).

jeremylt Sep 10, 2024
Maintainer

The composite operator is the first approach that comes to mind given the description above, though backends make fewer optimizations across composite operators than they currently do for each suboperator

jedbrown · 2024-09-10T18:59:27Z

jedbrown
Sep 10, 2024
Maintainer

I think the situation you're considering is like when one part computes a diffusion coefficient and another uses that coefficient (and other inputs) to compute a flux. That is not a libCEED composite operator because those are additive rather than sequential.

The fundamental issue is that for GPUs, you pay a lot to use dynamic dispatch and you pay a lot any time you spill intermediate state out of registers (it's basically a spill to global memory because you don't have big and fast L1 cache in the CPU sense). It is possible for you to write a qfunction that uses dynamic dispatch inside, but I don't think you'll like the performance. You can make a sequence of CeedOperators, but I don't think you'll like that performance either (it becomes very memory-intensive) and we don't know how to efficiently assemble operators (because the interface doesn't guarantee that all the intermediates are passed through as "collocated", though I suppose we could learn to recognize that case).

I think JIT is a better way to handle this sort of dynamic composition. We might be able to help with that, but would probably need a prototype and to look at the needs of the various backends.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended way to write qfunctions? #1660

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Recommended way to write qfunctions? #1660

lindsayad Sep 10, 2024

Replies: 3 comments · 4 replies

rezgarshakeri Sep 10, 2024 Collaborator

knepley Sep 10, 2024 Collaborator

rezgarshakeri Sep 10, 2024 Collaborator

jrwrigh Sep 10, 2024 Collaborator

jrwrigh Sep 10, 2024 Collaborator

jeremylt Sep 10, 2024 Maintainer

jedbrown Sep 10, 2024 Maintainer

lindsayad
Sep 10, 2024

Replies: 3 comments 4 replies

rezgarshakeri
Sep 10, 2024
Collaborator

knepley Sep 10, 2024
Collaborator

rezgarshakeri Sep 10, 2024
Collaborator

jrwrigh
Sep 10, 2024
Collaborator

jrwrigh Sep 10, 2024
Collaborator

jeremylt Sep 10, 2024
Maintainer

jedbrown
Sep 10, 2024
Maintainer