Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

Merged
merged 1 commit into from
Aug 18, 2023

Conversation

jti-lanl
Copy link
Contributor

@jti-lanl jti-lanl commented Aug 1, 2023

These are trivial tweaks that only change omp gather_smallbuf() for gcc, because that's all I've tested. I am currently making use of them by passing in some compile-time flags via CMAKE_C_FLAGS at build time. For example, on a Sapphire-Rapids node, even with -march=native, gcc only uses AVX2 instructions. To get it to use AVX512 vector instructions, you apparently need -fprefer-vector-width=512, and gcc >= 11. I have some scripting to figure out the gcc options from our side, but integrating automated options directly into your CMakeLists.txt would require a-little-more-comprehensive testing across compilers and hosts. The larger vectors do show some additional speedup (beyond AVX2) in a weak-scaling test.

Copy link
Contributor

@jyoung3131 jyoung3131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I tested this with a newish GCC implementation of Spatter.

@jyoung3131 jyoung3131 self-assigned this Aug 7, 2023
@jyoung3131 jyoung3131 added the enhancement New feature or request label Aug 7, 2023
@jyoung3131 jyoung3131 merged commit 8f6384a into hpcgarage:main Aug 18, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants