-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hfi_userinit: mmap of status page (dabbad0008030000) failed: Operation not permitted #29
Comments
Related commit in IB/hfi driver: |
Note that one can also use the execstack utility to query the executable stack flag of a binary:
|
This does, indeed, mitigate the issue in some cases. However, in one particular case I've encountered:
Recompiling a simple MPI program with
The segfault did not vary w.r.t. the version of Open MPI under the program: the segfault always occurred at the increment of This made me recall another issue we'd encountered: Gaussian Inc's use of
This combination of Portland compiler, Open MPI, and PSM2 does NOT fail to map the HFI capabilities AND does not segfault. This naturally calls into question what level of PGI processor optimization is 100% reliable on a Broadwell system. |
This is hardly a solution. Any program that passes around nested functions needs executable stack. That's standard Fortran and GNU-extension in C. It is used in very handy techniques. I basically canot run my program on a cluster that uses PSM2 now. Note that it happens for GCC compiled programs as well. |
This restriction on EXEC has been removed in the upstream kernel by the following commit. I'm not sure when any specific distros will be pulling it back but it may be worth asking your specific distro to do so. I suggest we close this issue as it was not a PSM2 library restriction. Just me being overly restrictive with security in the kernel. Or this can remain open until all the distros have had a chance to pull the patch. commit 7709b0dc265f28695487712c45f02bbd1f98415d
|
@weiny2 Is it a commit to the Linux kernel? I may try to persuade the admin to apply it. |
This is a commit to the HFI1 driver. Our driver is upstream so yes that is the commit information for the Linux kernel. I only mention this to make sure you are not running an out of tree driver. Because if so then you need to apply the patch to that driver. |
This is an informational post for other PSM2 users. In Red Hat Enterprise Linux 7.5, we found that some of our MPI executables exit with the following error:
hfi_userinit: mmap of status page (dabbad0008030000) failed: Operation not permitted
This error is thrown from this line of code in the PSM2 library:
https://github.com/intel/opa-psm2/blob/0f9213e7af8d32c291d4657ff4a3279918de1e60/opa/opa_proto.c#L482-L484
We tracked this down to the execute bit being set in the GNU_STACK of the ELF headers in a binary. That in turn attempts to map the memory region with both the read and execute bits enabled, rather than just the read bit as PSM2 is requesting. As described in this post:
https://stackoverflow.com/questions/32730643/why-in-mmap-prot-read-equals-prot-exec
"For what I understand, GNU_STACK program header is designed to tell the kernel that you want some specific properties for the stack, one those properties is a non-executable stack. It appears that if you don't explicitly ask for a non-executable stack, all the ELF sections marked as readable will be executable too. And also all the memory mapping with mmap while have the same behavior."
One can inspect a binary for this setting using readelf:
readelf --program-headers a.out
We could reproduce this by running a simple MPI program that was compiled with PGI.
For example, a binary built with PGI shows:
readelf --program-headers mpiBench_pgi
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 10
Whereas a binary built with GNU:
readelf --program-headers mpiBench_gnu
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
We found that a work around is to add "-Wl,-z,noexecstack" during the link step. Alternatively, one can force this bit off in an existing executable with execstack:
execstack -c a.out
The text was updated successfully, but these errors were encountered: