Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures #289

Open
casparvl opened this issue Jul 29, 2024 · 1 comment

Comments

@casparvl
Copy link

casparvl commented Jul 29, 2024

I'm building the CUDA samples for multiple architectures, since it is documented one can do this with the SMS option. My build command is:

make  -j 72 HOST_COMPILER=g++ SMS='80 86' 

I've encountered the issue with both Cuda-Samples 11.3, and 12.2. The issue is present in at least two samples: memMapIPCDrv and ptxjit. It is in this line and this line of their respective makefiles, which both read (with some context):

$(PTX_FILE): memMapIpc_kernel.cu
	$(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
	$(EXEC) mkdir -p data
	$(EXEC) cp -f $@ ./data
	$(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
	$(EXEC) cp -f $@ ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

I believe what should be done is store the GENCODE_FLAGS for PTX file generation separately. I.e this line should probably read:

# Generate PTX code from the highest SM architecture in $(SMS) to guarantee forward-compatibility
HIGHEST_SM := $(lastword $(sort $(SMS)))
ifneq ($(HIGHEST_SM),)
GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
endif
endif

And then the offending section modified to:

$(PTX_FILE): memMapIpc_kernel.cu
	$(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
	$(EXEC) mkdir -p data
	$(EXEC) cp -f $@ ./data
	$(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
	$(EXEC) cp -f $@ ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

I can at least confirm that with this diff:

$ cat *.patch
diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile        2024-07-29 12:14:28.538848000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile     2024-07-29 12:17:02.812364739 +0200
@@ -312,6 +312,7 @@
 HIGHEST_SM := $(lastword $(sort $(SMS)))
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
+GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
 endif

@@ -394,7 +395,7 @@
 endif

 $(PTX_FILE): memMapIpc_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile      2024-07-29 12:14:28.546771000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile   2024-07-29 12:15:47.089354181 +0200
@@ -306,6 +306,7 @@
 HIGHEST_SM := $(lastword $(sort $(SMS)))
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
+GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
 endif

@@ -390,7 +391,7 @@
 endif

 $(PTX_FILE): ptxjit_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

On top of the CUDA-Samples 12.2 sources, it builds correctly for multiple architectures. However, what I'm not 100% sure of, is if this makes sense, so I'm hoping someone else can confirm that :)

@casparvl
Copy link
Author

To answer my own question: this is not really the solution. I can run it on an H100 (CC 90), which means the forward compatibility is working. But I can't run it on an A100 (CC 80), which was one of my actual targets in SMS:

$ memMapIPCDrv
> findModulePath found file at <./memMapIpc_kernel64.ptx>
> initCUDA loading module: <./memMapIpc_kernel64.ptx>
> findModulePath found file at <./memMapIpc_kernel64.ptx>
> findModulePath found file at <./memMapIpc_kernel64.ptx>
> initCUDA loading module: <./memMapIpc_kernel64.ptx>
> initCUDA loading module: <./memMapIpc_kernel64.ptx>
> findModulePath found file at <./memMapIpc_kernel64.ptx>
> initCUDA loading module: <./memMapIpc_kernel64.ptx>
checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292.
checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292.
checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292.
checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292.
Process 0 failed!

(same for the ptxjit example btw)
It seems it will try to always JIT compile.

I'm really not sure what this sample is supposed to do when CUDA-Samples is build for multiple SMS's. It seems like it always wants to invoke the jit compiler on ptx code. The only thing that would create a working example is actually to replace creating the ptx code for the highest SM by creating it for the lowest SM. That way, it would at least be able to JIT-compile across all the SM architectures passed to SMS, even the lowest one.

The patch would then be:

diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile        2024-07-29 12:14:28.538848000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile     2024-07-29 13:02:45.134261829 +0200
@@ -313,6 +313,12 @@
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
+
+# Generate the explicit PTX file for the lowest SM architecture in $(SMS), so it works on all SMS listed there
+LOWEST_SM := $(firstword $(sort $(SMS)))
+ifneq ($(LOWEST_SM),)
+GENCODE_FLAGS_LOWEST_SM += -gencode arch=compute_$(LOWEST_SM),code=compute_$(LOWEST_SM)
+endif
 endif

 ifeq ($(TARGET_OS),darwin)
@@ -394,7 +400,7 @@
 endif

 $(PTX_FILE): memMapIpc_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_LOWEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile      2024-07-29 12:14:28.546771000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile   2024-07-29 13:02:38.741961008 +0200
@@ -307,6 +307,12 @@
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
+
+# Generate the explicit PTX file for the lowest SM architecture in $(SMS), so it works on all SMS listed there
+LOWEST_SM := $(firstword $(sort $(SMS)))
+ifneq ($(LOWEST_SM),)
+GENCODE_FLAGS_LOWEST_SM += -gencode arch=compute_$(LOWEST_SM),code=compute_$(LOWEST_SM)
+endif
 endif

 ifeq ($(TARGET_OS),darwin)
@@ -390,7 +396,7 @@
 endif

 $(PTX_FILE): ptxjit_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_LOWEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

If I build this for SMS='80 90', it works on both A100 and H100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant