[RFC] Improve GPU vector interface #82

makortel · 2018-06-13T14:19:51Z

Spurred by my earlier dislike on the interface of

cmssw/HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h

Lines 10 to 11 in 655e4ed

    
           namespace GPU { 
        
           template <class T> struct SimpleVector {

and the recent discussion with @felicepantaleo about

cmssw/HeterogeneousCore/CUDAUtilities/interface/GPUVecArray.h

Line 14 in e207de5

template <class T, int maxSize> struct VecArray {

I started to think whether we could improve the interface of a "GPU vector" a bit.

In StackOverflow I came across a pattern where a "GPU class" is split into two

A class owning the GPU memory
A wrapper class holding only raw pointers to the GPU memory so that it can be passed to the kernels by value

In this PR I toyed with these ideas for a GPU vector implementation (I hope the unit test is enough to demonstrate how it is used, I'm sure it can be improved further).

I feel the pattern of passing the "structs of device pointers" by value to the kernels would simplify the code as we could avoid doing cudaMalloc for the struct itself.

@felicepantaleo @VinInn @fwyzard @rovere

cmsbot · 2018-06-13T14:20:11Z

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

HeterogeneousCore/CUDAUtilities

The following packages do not have a category, yet:

HeterogeneousCore/CUDAUtilities
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbot can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

makortel · 2018-06-14T08:19:01Z

One could actually go one step further to make the interface "safer" wrt. synchronization (i.e. trying to avoid to remember explicit synchronization calls by making them more "automated) by

Making the 'GPUVector` to only own the memory (and provide data transfer functions for the helper classes)
Add a "host wrapper class" (let's call it "host view" from now on, similarly the current GPUVectorWrapper<T> -> "device view")
- Constructing/asking for a "host view" automatically transfers the size from GPU to CPU
  - Async variant can also be provided, user is then responsible to cudaStreamSynchronize before using the "host view"
  - Also a "no update" variant can be provided for cases where it is guaranteed that the device vector size has not changed since the last construction of "host view"

I'm not really sure if hiding the transfers and synchronizations this way would make the code actually clearer. I can provide an example if there is interest.

fwyzard · 2018-06-29T08:28:21Z

Validation summary

Reference release CMSSW_10_2_0_pre5 at 30c7b03
Development branch CMSSW_10_2_X_Patatrack at 10d59f2
Testing PRs:

[RFC] Improve GPU vector interface #82 at 38e1099

`makeTrackValidationPlots.py` plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

tracking validation plots for workflow 10824.5
tracking validation plots for workflow 10824.8
tracking validation plots for workflow 10824.7
tracking validation plots for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

tracking validation plots for workflow 10824.5
tracking validation plots for workflow 10824.8 are missing
tracking validation plots for workflow 10824.7
tracking validation plots for workflow 10824.9

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

reference DQM plots for reference release, workflow 10824.5
DQM plots for development release, workflow 10824.5
DQM plots for development release, workflow 10824.8 are missing
DQM plots for development release, workflow 10824.7
DQM plots for development release, workflow 10824.9
DQM plots for testing release, workflow 10824.5
DQM plots for testing release, workflow 10824.8 are missing
DQM plots for testing release, workflow 10824.7
DQM plots for testing release, workflow 10824.9
DQM comparison for reference workflow 10824.5
DQM comparison for workflow 10824.8
DQM comparison for workflow 10824.7
DQM comparison for workflow 10824.9

logs and `nvprof/nvvp` profiles

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

reference log, visual profile and summary for workflow 10824.5
development log, visual profile and summary for workflow 10824.5
development log, visual profile and summary for workflow 10824.8
development log, visual profile and summary for workflow 10824.7
development log, visual profile and summary for workflow 10824.9
testing log, visual profile and summary for workflow 10824.5
testing log, visual profile and summary for workflow 10824.8
testing log, visual profile and summary for workflow 10824.7
testing log, visual profile and summary for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

reference log, visual profile and summary for workflow 10824.5
development log, visual profile and summary for workflow 10824.5
development log, visual profile and summary for workflow 10824.8
development log, visual profile and summary for workflow 10824.7
development log, visual profile and summary for workflow 10824.9
testing log, visual profile and summary for workflow 10824.5
testing log, visual profile and summary for workflow 10824.8
testing log, visual profile and summary for workflow 10824.7
testing log, visual profile and summary for workflow 10824.9

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/798e35cfd563696abc727c08a6c31c28bbabe374/log .

fwyzard · 2018-06-29T18:38:37Z

Before spending more time on this, I think we should evaluate if Unified Memory works well enough, as it would probably render these utility classes obsolete.
@makortel , @felicepantaleo , @VinInn let me know if you would still find this useful, even just to play with, and I will merge it.

makortel · 2018-06-29T18:50:40Z

I agree the evaluation of the Unified Memory is more important than testing this "toy" in action, exactly because the Unified Memory would make many things much simpler. Although maybe even with Unified Memory we want to have a specific vector-like class (or classes separating the ownership and a "view-like" usage) if we want to avoid memory allocations caused by copying.

(and anyway I intended this PR more for discussion than merging as-is)

Address code review comments, including modernisation of code

Prototype of better GPU vector interface

38e1099

cmsbot added comparison-pending labels Jun 13, 2018

fwyzard removed comparison-pending labels Jul 31, 2018

fwyzard added question enhancement labels Aug 9, 2018

fwyzard force-pushed the CMSSW_10_2_X_Patatrack branch 2 times, most recently from 48d4372 to a721b31 Compare August 17, 2018 20:51

fwyzard force-pushed the CMSSW_10_2_X_Patatrack branch 2 times, most recently from 5200bc1 to cf2d1bb Compare August 30, 2018 07:24

fwyzard removed comparison-pending labels Aug 30, 2018

fwyzard removed alca-pending labels Sep 25, 2018

fwyzard pushed a commit that referenced this pull request Nov 1, 2018

Merge pull request #82 from steggema/TauClusterVarRefactor_10_3_X

1ad3da6

Address code review comments, including modernisation of code

fwyzard changed the base branch from CMSSW_10_2_X_Patatrack to CMSSW_10_4_X_Patatrack November 15, 2018 08:33

fwyzard added this to the CMSSW_10_4_X_Patatrack milestone Nov 15, 2018

fwyzard modified the milestones: CMSSW_10_4_X_Patatrack, CMSSW_10_5_X_Patatrack Jan 8, 2019

fwyzard force-pushed the CMSSW_10_4_X_Patatrack branch from 59fe318 to db3e6f8 Compare January 9, 2019 14:14

fwyzard modified the milestones: CMSSW_10_5_X_Patatrack, CMSSW_10_6_X_Patatrack Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Improve GPU vector interface #82

[RFC] Improve GPU vector interface #82

makortel commented Jun 13, 2018

cmsbot commented Jun 13, 2018

makortel commented Jun 14, 2018

fwyzard commented Jun 29, 2018

fwyzard commented Jun 29, 2018

makortel commented Jun 29, 2018

[RFC] Improve GPU vector interface #82

Are you sure you want to change the base?

[RFC] Improve GPU vector interface #82

Conversation

makortel commented Jun 13, 2018

cmsbot commented Jun 13, 2018

makortel commented Jun 14, 2018

fwyzard commented Jun 29, 2018

Validation summary

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre5-PU25ns_102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre5-102X_upgrade2018_realistic_v1-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Jun 29, 2018

makortel commented Jun 29, 2018

`makeTrackValidationPlots.py` plots

logs and `nvprof/nvvp` profiles