Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Redundant Functions + 60% MD Speedup #98

Conversation

GNiendorf
Copy link
Member

@GNiendorf GNiendorf commented Sep 26, 2024

Below is run on the L40.

This PR Timing:
Screenshot 2024-09-26 at 4 13 20 PM
Master Timing:
Screenshot 2024-09-26 at 4 15 43 PM

MD creation time goes from 1.0ms to 0.4ms, meaning that the redundant matrix creation mentioned below represented the majority of the MD creation time on GPU... 😱 @slava77

@GNiendorf
Copy link
Member Author

/run standalone

float miniDeltaLooseTilted[3] = {0.4f, 0.4f, 0.4f};
float miniDeltaEndcap[5][15];

for (size_t i = 0; i < 5; i++) {
Copy link
Member Author

@GNiendorf GNiendorf Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matrix creation code was being run many times, representing a significant portion of the MD creation time it seems like.

@GNiendorf GNiendorf changed the title Remove Redundant Functions Remove Redundant Functions + 60% MD Speedup Sep 26, 2024
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.5    326.0    116.8     47.7     93.5    500.2    113.4    150.7    101.4      2.3    1496.7     951.9+/- 247.8     410.9   explicit_cache[s=4] (target branch)
   avg     48.1    324.0    116.0     46.6     92.5    509.4    114.3    150.4    101.5      2.5    1505.4     947.8+/- 246.7     409.2   explicit_cache[s=4] (this PR)

@GNiendorf GNiendorf marked this pull request as ready for review September 26, 2024 20:35
@slava77
Copy link

slava77 commented Sep 26, 2024

MD creation time goes from 1.0ms to 0.4ms, meaning that the redundant matrix creation mentioned below represented the majority of the MD creation time on GPU

nice find/fix.
Thank you.

@slava77
Copy link

slava77 commented Sep 26, 2024

apparently this is addressing SegmentLinking/TrackLooper#303

@VourMa
Copy link
Collaborator

VourMa commented Sep 27, 2024

Indeed, nice catch and clean up. I would even considering adding this to the CMSSW PR (especially if the renaming to adhere to CMSSW naming is real and not an artefact of the rebases). What do you think?

@ariostas Is the _devel branch fully up to date with the branch for the CMSSW PR?

@ariostas
Copy link
Member

The _devel branch is not up to date since there are still some big changes coming. I actually just noticed that the _devel PR is a bit of a mess, so we'll have to figure that out at some point. But I agree that this should go into the _realfiles branch. This is great!

I think there's other PRs by Gavin that should also be included in _realfiles since there have been review comments about them. In particular, #39 and #54.

@VourMa
Copy link
Collaborator

VourMa commented Sep 27, 2024

I think there's other PRs by Gavin that should also be included in _realfiles since there have been review comments about them. In particular, #39 and #54.

I would be less eager to also push these ones, because:

  1. They introduce a new (physics) feature (lower pT cut) which is not strictly needed at this stage.
  2. They require new data files, with all the complication this brings to the cms-sw way of running things for PRs.
  3. They define a clear-cut development, separated from the rest (up to a good degree).

So I think they would be great for a standalone PR.

@ariostas
Copy link
Member

I would be less eager to also push these ones, because...

That's true. But Andrea has asked for occupancy tables instead of a bunch of if-else statements, and if we want to fix SDL_INF we'll have to update the data files anyway, so maybe we could just get that out of the way

@slava77
Copy link

slava77 commented Sep 27, 2024

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.1    325.7    115.6     45.1     91.0    502.0    113.6    149.4    100.7      2.2    1489.5     943.4+/- 247.6     406.8   explicit_cache[s=4] (target branch)
   avg     47.6    324.3    116.0     45.6     91.8    511.2    112.9    150.0    100.8      2.4    1502.5     943.8+/- 245.0     411.4   explicit_cache[s=4] (this PR)

@slava77
Copy link

slava77 commented Sep 27, 2024

linter check isn't happy here either

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf
Copy link
Member Author

@ariostas When I run the code format I see a bunch of suggestions for files not related to this PR:

Screenshot 2024-09-27 at 2 46 24 PM

@slava77
Copy link

slava77 commented Sep 27, 2024

@ariostas When I run the code format I see a bunch of suggestions for files not related to this PR:

did you update to the new release or still reused the old release area with a new cmsrel?

@GNiendorf
Copy link
Member Author

Oh no, I haven't updated. Let me do that and try again.

@slava77
Copy link

slava77 commented Sep 27, 2024

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.0    326.4    116.0     47.9     92.9    501.9    113.2    150.3    100.6      2.5    1496.7     949.8+/- 244.3     410.6   explicit_cache[s=4] (target branch)
   avg     47.2    323.1    115.9     46.9     91.5    506.0    113.4    148.9    101.7      2.3    1496.9     943.6+/- 249.2     410.0   explicit_cache[s=4] (this PR)

@ariostas
Copy link
Member

When I run the code format I see a bunch of suggestions for files not related to this PR:

I have the same issue. I'm not sure how to get around it. I think they didn't format all files after clang-format got updated

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77 slava77 merged commit 3000f9b into CMSSW_14_1_0_pre3_LST_X_LSTCore_realfiles_batch1_devel Sep 27, 2024
2 of 3 checks passed
@slava77
Copy link

slava77 commented Sep 27, 2024

the linter complaints are for parts of the files not touched by this PR

@slava77
Copy link

slava77 commented Sep 27, 2024

Here is a timing comparison:

there is practically no change in the CPU backend: MD changes from 326.4 to 323.1; the other run was faster by around 1 s.
I guess the optimization works out differently (or something else is very slow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants