-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running Corrfunc with nthreads>1 on cluster and some strange results #197
Comments
Thanks for the report! That's pretty strange, I certainly haven't seen Corrfunc do that before. Does the problem only occur for DDsmu_mocks or other estimators as well? Could you export |
Thanks @zxzhai for the report. Could you please follow what @lgarrison suggested above. It seems that OpenMP might need to be explicitly enabled at runtime. When you installed Corrfunc with pip on that cluster, were all the required modules loaded explicitly? Otherwise, the Corrfunc install might have proceeded with the compiler supplied with the OS - and those might not come with OpenMP support. |
Hi @lgarrison and @manodeep , thanks for the suggestions! I did two tests and these are what I got: I tested the code DDrppi_mock from Corrfunc.mocks.DDrppi_mocks, and there was the same problem on that particular cluster. When I did export OMP_DISPLAY_ENV=TRUE and rerun the code, it gave me following information: OPENMP DISPLAY ENVIRONMENT BEGIN OPENMP DISPLAY ENVIRONMENT BEGIN There seems to be some inconsistency, the two _OPENMP are different (201511 vs 201611). I did the same thing on the other computer that has no problem, and the two _OPENMP are the same (both are 201511). So I suspect that this might be the reason. I will check to see if I can fix it. |
I agree it looks like two different OpenMP libraries are getting loaded at runtime, possibly GNU and Intel ( Static linking of OpenMP by other extensions or executables is another way multiple runtimes could get initialized. But I'm not sure if that would cause duplicate pair counts... although I don't understand how multiple dynamic runtimes would cause that either! I think I'd recommend what @manodeep suggested: try uninstalling Corrfunc, making sure all your compiler modules are loaded, and then reinstall. If pip doesn't work, try building from source where you can specify the correct compiler manually. And maybe try both inside and outside of Anaconda Python (if you're using that). |
Thanks for the suggestions. I think I've solved this problem, but don't completely understand why. What I learned is: install Corrfunc from source, don't do pip to install. I have a .bashrc file indicating another gcc library for another code (not Corrfunc). So I have to switch off all the related setup and this means the that openmp library is just the default on the system. After that I reinstall Corrfunc from source and it seems that the problem is solved. One place to check is "CC :=" in the common.mk file, just in case if some other people meet the same problem in the future. The place I don't understand is that when this problem is solved (the output doesn't depend on nthreads and the scaling of speed is fine), I also do export OPM_DISPLAY_ENV=TRUE, the output for the two sections are still inconsistent. So it looks like the different versions of openmp or different versions of the same library doesn't impact the result (at least in this scenario), the previous error was caused by something else but unknown, maybe depend on how python uses openmp and in which step the library is called. |
Thanks for reporting back! This is all really good to know. It's something of a relief that the multiple OpenMP versions aren't clashing, because I don't know how they would have been executing the same parallel region. I think the "inconsistent" OpenMP libraries could easily be coming from different Python packages that were compiled with different I'm happy to help if you'd like to dig into the other compiler to try to figure out how it caused this behavior, but otherwise feel free to close the issue. |
Hi, |
@zxzhai There is no real difference between @samotracio Your investigation seems quite relevant. The Corrfunc scheduling is always specified as |
I'm also having, perhaps related, perhaps totally different, problems with parallelization. In my case, setting I've added logging to make sure that
My system:
I'll keep looking, but any suggestions would be appreciated! |
These problems smack of core affinity issues; i.e. the process affinity mask is set to only execute on one core. This can arise when using Even if not, a "rogue" Python package could be setting the affinity. Numpy does this, but only when using Regardless, I would try tracking the core affinity, starting at the C level inside Corrfunc to check if the affinity is actually restricted. Here is a sample program I have used in the past for this purpose: #include <omp.h>
#include <stdio.h>
#include <sched.h>
#include <assert.h>
#include <stdlib.h>
int main(void){
// First report the CPU affinity bitmask.
cpu_set_t mask;
int cpusetsize = sizeof(cpu_set_t);
assert(sched_getaffinity(0, cpusetsize, &mask) == 0);
int naff = CPU_COUNT_S(cpusetsize, &mask);
printf("Core affinities (%d total): ", naff);
for (int i = 0; i < CPU_SETSIZE; i++) {
if(CPU_ISSET(i, &mask))
printf("%d ", i);
}
printf("\n");
int maxthreads = omp_get_max_threads();
int nprocs = omp_get_num_procs();
printf("omp_get_max_threads(): %d\n", maxthreads);
printf("omp_get_num_procs(): %d\n", nprocs);
return 0;
}
If the affinity is actually restricted, then try going one level higher, into Python: import psutil
print('Python affinity #:', len(psutil.Process().cpu_affinity())) Place that |
I forgot to mention that to check the affinity of the shell, in Bash one can use: taskset -c -p $$ where |
Thanks a lot for the suggestions, I'll give them a try now. I am not running on a cluster, just my local desktop. |
Another possibility totally unrelated to OpenMP: Corrfunc threads over cell pairs, so if the problem is extremely clustered such that a single cell pair dominates the runtime (e.g. the autocorrelation of a single massive cell), then you will see a burst of multi-threaded activity at the beginning followed by a long period of a single thread running. You can alleviate this somewhat by specifying a larger |
I just encountered this issue while running on an interactive node via the slurm queue. The solution was that I had to specify [~ @john1] taskset -c -p $$
pid 116541's current affinity list: 16,18,20,30 Before I added the Not the solution required, but might solve one class of OpenMP issues. |
General information
Hi, I installed Corrfunc on a cluster and run some simple tests with the algorithm DDsmu_mocks. When I specified nthreads>1, the values of the resulting pair counts are nthreads time the result using single thread. And the runtime is also nthreads larger. This is very strange and it seems that each thread is running consecutively and processing the full set of points itself instead of splitting the work between threads.
I also test the same code on my laptop and another cluster, and there are no problem. The results of different nthreads are the same and the runtime is also (roughly) nthreads times faster. This implies that the problem only exists on this particular cluster, but I don't understand why if there is anything about the configuration of this cluster impacts the code.
May I ask if any of the developers have similar experience, and any suggestions?
Thanks!
Issue description
Expected behavior
Actual behavior
What have you tried so far?
Minimal failing example
The text was updated successfully, but these errors were encountered: