Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

spencerw · 2024-11-20T22:57:17Z

Having GPU kernel launches tied to the TreePieces degrades performance and is probably causing some data race conditions. This PR consolidates the kernel launches for the local gravity tree walk and Ewald calculation. Both the kernel launches and device memory allocation for these operations are now handled by the DataManager.

Note that another open PR #183 has already been merged into this branch.

…n GPU tree walk enabled

…mmenceCalculateGraviyLocal is no longer called

… Ewald

spencerw · 2024-11-20T22:58:51Z

We still need to decide what to do about the nodeGravityComputation and particleGravityComputation kernel launches. I don't think the remote gravity performance will benefit much from consolidating these, but this PR as-is probably breaks the local GPU gravity calculation if we aren't using the gpu-local-tree-walk option.

trquinn · 2024-11-21T05:47:10Z

This doesn't even compile if "--enable-gpu-local-tree-walk" is not specified.

spencerw · 2024-11-21T22:27:39Z

I just tested this out using the verbs comm layer and CUDA memory errors are back. I'm guessing the poor performance from MPI was actually preventing the remote gravity kernels from stepping on each other.

We'll see if the CSA folks have any other suggestions when we talk to them next Monday, but I think we're going to need to move all of the nodeGravityComputation and particleGravityComputation kernel launches to the DataManager as well.

spencerw added 19 commits September 24, 2024 18:28

clearRegisteredPieces called before tree build

e2393c7

CUDA streams assigned to TreePieces after tree build

7dec4c5

Move clearRegisteredPieces inside buildTree

ba603d6

Remove unused code

8bbeccb

Move assignCUDAStreams() call inside of buildTree()

049e4fa

Remove DataTransfer calls from TreePieceCellListDataTransferLocal whe…

8ddb491

…n GPU tree walk enabled

Hack to launch gpuLocalTreeWalk from DataManager

a5e500b

Hack to run remote gravity without bookkeeping

83961cf

Restore bookkeeping around GPU local tree walk

f24d350

Use bare callback for local walk

bb0d5bb

First attempt at consolidated Ewald GPU kernel

5eb603a

Merge remote-tracking branch 'remotes/origin/dm_tp' into kernelfix

85191fe

More fixes to consolidated Ewald

37ba423

Ensure device memory pointers are passed to TreePieces even though co…

336b959

…mmenceCalculateGraviyLocal is no longer called

Call finishBucket after local tree walk, ensure smooth happens before…

ee3d927

… Ewald

Pass nReplicas and fPeriod to DataManagerLocalTreeWalk

aa919f8

Removed unused Ewald GPU code

40de4f5

Ewald GPU data to pinned host memory, remove markers

6b012e5

Comments and code cleanup

e4cb17f

Fix for bucket bookkeeping

2866fdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

spencerw commented Nov 20, 2024

spencerw commented Nov 20, 2024

trquinn commented Nov 21, 2024

spencerw commented Nov 21, 2024

Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

Are you sure you want to change the base?

Consolidate gpuLocalTreeWalk and Ewald Kernel Launches #186

Conversation

spencerw commented Nov 20, 2024

spencerw commented Nov 20, 2024

trquinn commented Nov 21, 2024

spencerw commented Nov 21, 2024