Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

jti-lanl · 2023-08-01T23:53:56Z

These are trivial tweaks that only change omp gather_smallbuf() for gcc, because that's all I've tested. I am currently making use of them by passing in some compile-time flags via CMAKE_C_FLAGS at build time. For example, on a Sapphire-Rapids node, even with -march=native, gcc only uses AVX2 instructions. To get it to use AVX512 vector instructions, you apparently need -fprefer-vector-width=512, and gcc >= 11. I have some scripting to figure out the gcc options from our side, but integrating automated options directly into your CMakeLists.txt would require a-little-more-comprehensive testing across compilers and hosts. The larger vectors do show some additional speedup (beyond AVX2) in a weak-scaling test.

jyoung3131

Looks good - I tested this with a newish GCC implementation of Spatter.

Allow gcc to vectorize gather_smallbuf() in openmp_kernels

accaa80

jyoung3131 approved these changes Aug 7, 2023

View reviewed changes

jyoung3131 self-assigned this Aug 7, 2023

jyoung3131 added the enhancement New feature or request label Aug 7, 2023

jyoung3131 merged commit 8f6384a into hpcgarage:main Aug 18, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

jti-lanl commented Aug 1, 2023

jyoung3131 left a comment

Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

Allow gcc to vectorize gather_smallbuf() in openmp_kernels #151

Conversation

jti-lanl commented Aug 1, 2023

jyoung3131 left a comment

Choose a reason for hiding this comment