-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186
base: master
Are you sure you want to change the base?
Conversation
…n GPU tree walk enabled
…mmenceCalculateGraviyLocal is no longer called
We still need to decide what to do about the nodeGravityComputation and particleGravityComputation kernel launches. I don't think the remote gravity performance will benefit much from consolidating these, but this PR as-is probably breaks the local GPU gravity calculation if we aren't using the gpu-local-tree-walk option. |
This doesn't even compile if "--enable-gpu-local-tree-walk" is not specified. |
I just tested this out using the verbs comm layer and CUDA memory errors are back. I'm guessing the poor performance from MPI was actually preventing the remote gravity kernels from stepping on each other. We'll see if the CSA folks have any other suggestions when we talk to them next Monday, but I think we're going to need to move all of the nodeGravityComputation and particleGravityComputation kernel launches to the DataManager as well. |
Having GPU kernel launches tied to the TreePieces degrades performance and is probably causing some data race conditions. This PR consolidates the kernel launches for the local gravity tree walk and Ewald calculation. Both the kernel launches and device memory allocation for these operations are now handled by the DataManager.
Note that another open PR #183 has already been merged into this branch.