-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change OpenMP scheduling to from dynamic
to static
#264
Comments
@lgarrison I was thinking of implementing this change asap - what do you think? |
Not sure... I'm worried that static scheduling will hurt the performance for more clustered cases. I'm also not sure "hiding" the problem with static scheduling is a good idea, even as a stop-gap measure, because we don't really understand all the ways this problem can manifest. I think keeping the problem highly visible and obviously wrong, as it is now, might be preferable. |
Yeah that's the choice - would it be better to be disastrously wrong or just slightly wrong? Disastrously wrong has a greater chance of people noticing but for those that may not notice, the results will be wrong I will close this then and solve/detect the race condition as we are progressing on #258 |
Speaking of detecting the race condition, maybe it's worth trying the GCC 9.2 thread sanitizer on the Python extension? The memory/address sanitizers were always hard to run because of all the false positives from Python, but perhaps the thread sanitizer will behave better? |
Ohh good idea - I will try that later today |
Another idea: maybe worth trying An even simpler version of this train of thought: can we print the thread IDs and check that they are unique? |
I tried |
Nothing stands out with the printed threads. However, setting OMP_DISPLAY_ENV=verbose does show two OPENMP environments from python but only 1 from C
Another weird thing that I noted is that even though the I have also discovered that there is a race condition with JEEZ |
Solved this one - the C tests do not |
I did some further reading (can't find the reference - the name looked familiar - so someone in the open-source python ecosystem) and it is possible that the OpenMP issues are occurring because OpenMP is linked while the static library is generated (i.e., |
Creating shared libraries (instead of static libraries) did not resolve the issue of multiple OpenMP runtime libraries (libgomp from compiling with GCC and libiomp5 from mkl within numpy). However, setting "export MKL_THREADING_LAYER=GNU" resolved the problem entirely. Source - https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md |
Setting that environment variable with #if defined(_OPENMP)
#include <omp.h>
int test_omp_duplicate_runtime_lib_DOUBLE(const int64_t N);
int test_omp_duplicate_runtime_lib_DOUBLE(const int64_t N)
{
int64_t correct_sum=0.0;
for(int64_t i=0;i<N;i++) {
correct_sum += i;
}
int64_t omp_sum=0.0;
#pragma omp parallel for schedule(dynamic) reduction(+:omp_sum)
for(int64_t i=0;i<N;i++) {
omp_sum += i;
}
if(omp_sum != correct_sum) {
fprintf(stderr,"Basic OMP reduction is causing errors. Expected sum = %"PRId64" sum with OpenMP + reduction = %"PRId64"\n",
correct_sum, omp_sum);
return EXIT_FAILURE;
}
fprintf(stderr,"Correct result returned\n");
return EXIT_SUCCESS;
}
#endif For the case with duplicate OMP libraries, this test code errors with Separately, when setting |
General information
python -m pip install -e .
Issue description
OpenMP install with gcc/9.2.0 produces incorrect results when invoked from python. This was originally reported in #197, and then rediscovered in #258
Expected behaviour
The results with OpenMP should be correct when invoked through python
Actual behaviour
The pair counts are
nthreads
times larger when using the python wrappers. The results are correct when directly using the C static library.What have you tried so far?
Compiled without OpenMP makes this problem go away
Changing the OpenMP scheduling to
static
makes the problem less severe - ie, the pair counts are off by a little bit rather thannthreads
times. As an added benefit, the code also appears to run marginally fasterMinimal failing example
After installing with the changes in #258, the following fails:
The text was updated successfully, but these errors were encountered: