-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin migrate ScalarQuantizer to simdlib #3613
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D59395882 |
This pull request was exported from Phabricator. Differential Revision: D59395882 |
Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) Differential Revision: D59395882
Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882
This pull request was exported from Phabricator. Differential Revision: D59395882 |
@mdouze Do you have any plans to support ARM SVE, if possible? The primary problem of simdlib with ARM SVE is that it implies SIMD registers of a variable size. Technically, there are two the popular models on the market: Amazon Graviton 3 with SIMD width 256b and an upcoming Graviton 4 with SIMD with 512b, so maybe one could stick with 256 bits for now. |
@alexanderguzhva IMO it would be great to support SVE. |
@mdouze Yes, the SVE size is known at the compile time. Usually, it is done via |
what is the status of this diff? Should I wait before I bring some updates to ScalarQuantizer? |
@alexanderguzhva I'm starting to work on this but it's gonna take some time. If you want to make your changes in now, feel free to and I can work on refactoring later down the line |
@mengdilin any time estimates on your end? Basically, are you in a stage where you know what to do exactly or are you in a research stage? |
Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882
@alexanderguzhva I think I can finish up AVX2/Neon in ScalarQuantizer around October (have other work items at hand atm). My understanding here is I should move the respective parts of AVX2 and Neon code in ScalarQuantizer into I'm a SIMD noob here. Let me know if I'm moving in the right direction for the refactor or if I'm missing anything major. |
Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882
Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882
@mdouze @alexanderguzhva I found it now, so I comment about above discussion:
Graviton4 has 128bit SVE registers: user@ip-172-31-xx-xx:/tmp$ cat test.cpp
#include<iostream>
#include<arm_sve.h>
int main(){
std::cout << svcntb()*8 << std::endl;
}
user@ip-172-31-xx-xx:/tmp$ g++ -march=armv9-a+sve2 -otest test.cpp
user@ip-172-31-xx-xx:/tmp$ ./test
128
Let's summarize the information around this:
I've tried to make simdlib supporting SVE, but as you know that is extremely hard job. For the time being, it's better to write SVE code without much abstraction IMHO. If the package file size bloat is acceptable, fixing the vector length is an alternative. |
Summary:
As a demo for Mengdi.
The steps to fully migrate to simdlib are:
change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode.
make sure it also compiles on ARM
see which functions can be mirgrated to only use the generic codepath
benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support)
Differential Revision: D59395882