Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

spencerw
Copy link
Member

Having GPU kernel launches tied to the TreePieces degrades performance and is probably causing some data race conditions. This PR consolidates the kernel launches for the local gravity tree walk and Ewald calculation. Both the kernel launches and device memory allocation for these operations are now handled by the DataManager.

Note that another open PR #183 has already been merged into this branch.

@spencerw
Copy link
Member Author

We still need to decide what to do about the nodeGravityComputation and particleGravityComputation kernel launches. I don't think the remote gravity performance will benefit much from consolidating these, but this PR as-is probably breaks the local GPU gravity calculation if we aren't using the gpu-local-tree-walk option.

@trquinn
Copy link
Member

trquinn commented Nov 21, 2024

This doesn't even compile if "--enable-gpu-local-tree-walk" is not specified.

@spencerw
Copy link
Member Author

I just tested this out using the verbs comm layer and CUDA memory errors are back. I'm guessing the poor performance from MPI was actually preventing the remote gravity kernels from stepping on each other.

We'll see if the CSA folks have any other suggestions when we talk to them next Monday, but I think we're going to need to move all of the nodeGravityComputation and particleGravityComputation kernel launches to the DataManager as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants