-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use _mm512_popcnt_epi64 to speedup hamming distance evaluation. #4020
base: main
Are you sure you want to change the base?
Use _mm512_popcnt_epi64 to speedup hamming distance evaluation. #4020
Conversation
Signed-off-by: Mulugeta Mammo <[email protected]>
@mulugetam Sure, |
@mulugetam FYI we are also working with other Intel collaborators (@guangzegu and @xtangxtang) to incorporate AMX to FAISS from #3266 @asadoughi suggested perhaps we should have a meeting and sync up on the collaboration with Intel |
Thanks for your contributions :) Question: What is the performance for Hamming distance calculations when |
VPOPCNTDQ Any downside in using the scheme below? I check if the compiler and the underlying machine support VPOPCNTDQ and then add the -mavx512vpopcntdq flag to the target_compile_options based on that.
|
@mengdilin If I benchmark the AVX-512 build without VPOPCNTDQ and compare it to the AVX2 build, I’d see about a 1% - 5% speedup (depending on the code_size) for AVX-512. |
@mengdilin |
@mulugetam Technically, cmake-based check for vpopcntdq may lead to the following situation. Say, a build is performed on a most recent CPU get, and then the built python package is put into conda / pip repository. This package gets spread across the world and leads to problems for all ppl who has Intel Cascade Lake CPUs. |
@alexanderguzhva Thanks! In addition to VPOPCNTDQ, we plan to make additional PRs to speed up the scalar quantizer with FP16 instructions. I will modify this PR and introduce an 'avx512_advanced' mode. Hopefully, the team will accept the changes, given that it offers a performance boost. |
@mulugetam |
@mulugetam Well, there are two more problems then.
Nevertheless, please write and benchmark the code, and then we'll decide how to integrate it into Faiss properly. :) Thanks |
@alexanderguzhva @mengdilin I have created new PR#4025 that adds 'avx512-sr' architecture mode and marked this PR to be dependent on it. |
The
_mm512_popcnt_epi64
intrinsic is used to accelerate Hamming distance calculations inHammingComputerDefault
andHammingComputer64
.Benchmarking with bench_hamming_computer on AWS r7i instance shows a performance improvement of up to 30% compared to AVX-2.
This PR depends on PR#4025