-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Improve GPU vector interface #82
base: CMSSW_10_4_X_Patatrack
Are you sure you want to change the base?
Conversation
A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack. It involves the following packages: HeterogeneousCore/CUDAUtilities The following packages do not have a category, yet: HeterogeneousCore/CUDAUtilities @cmsbot can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
One could actually go one step further to make the interface "safer" wrt. synchronization (i.e. trying to avoid to remember explicit synchronization calls by making them more "automated) by
I'm not really sure if hiding the transfers and synchronizations this way would make the code actually clearer. I can provide an example if there is interest. |
Validation summaryReference release CMSSW_10_2_0_pre5 at 30c7b03
|
Before spending more time on this, I think we should evaluate if Unified Memory works well enough, as it would probably render these utility classes obsolete. |
I agree the evaluation of the Unified Memory is more important than testing this "toy" in action, exactly because the Unified Memory would make many things much simpler. Although maybe even with Unified Memory we want to have a specific vector-like class (or classes separating the ownership and a "view-like" usage) if we want to avoid memory allocations caused by copying. (and anyway I intended this PR more for discussion than merging as-is) |
48d4372
to
a721b31
Compare
5200bc1
to
cf2d1bb
Compare
Address code review comments, including modernisation of code
59fe318
to
db3e6f8
Compare
Spurred by my earlier dislike on the interface of
cmssw/HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h
Lines 10 to 11 in 655e4ed
and the recent discussion with @felicepantaleo about
cmssw/HeterogeneousCore/CUDAUtilities/interface/GPUVecArray.h
Line 14 in e207de5
I started to think whether we could improve the interface of a "GPU vector" a bit.
In StackOverflow I came across a pattern where a "GPU class" is split into two
In this PR I toyed with these ideas for a GPU vector implementation (I hope the unit test is enough to demonstrate how it is used, I'm sure it can be improved further).
I feel the pattern of passing the "structs of device pointers" by value to the kernels would simplify the code as we could avoid doing
cudaMalloc
for the struct itself.@felicepantaleo @VinInn @fwyzard @rovere