[ROCm] Changes to enable build for ROCm platform #401

pruthvistony · 2021-07-01T02:13:28Z

Build is currently enabled for only hip_basic(cuda_basic)
Sample build command for reference
cmake ../ -DCMAKE_C_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DCMAKE_CXX_FLAGS="-Werror -Wno-deprecated-declarations -D__HIP_PLATFORM_HCC__=1" -DTP_ENABLE_SHM=OFF -DTP_ENABLE_CMA=OFF -DTP_USE_ROCM=ON -DTP_ENABLE_HIP_XTH=OFF -DTP_ENABLE_HIP_IPC=OFF -DTP_ENABLE_HIP_GDR=OFF -DTP_ENABLE_IBV=OFF -DTP_BUILD_TESTING=ON

pruthvistony · 2021-07-01T02:16:15Z

PR 398 needs to be merged before these changes.
Changes will be updated after rebase.

pruthvistony · 2021-07-01T02:18:17Z

@lw @jeffdaily @jithunnair-amd
Please review these changes

jithunnair-amd · 2021-07-01T05:28:28Z

tools/amd_build/replace_cuda_with_hip_files.py

+
+args = parser.parse_args()
+
+dict_file_name = args.dump_dict_directory + "/hipify_output_dict_dump.txt"


I think it'd be better if this script doesn't assume the name of the hipified dict file, instead the filename itself should be passed in as an argument, not the directory name

The file name for the output hipified dict file is parameterized and is now internal handled inside hipify-torch module.
repo

jithunnair-amd · 2021-07-01T05:30:23Z

tools/amd_build/build_amd.py

+    required=True)
+
+parser.add_argument(
+    '--dump-dict-directory',


Do we foresee a use-case where this directory would contain multiple dict files at the same time? It doesn't look like it here, in which case, the dict file name should be passed in as an argument instead?

Dont foresee a use case like above. Currently recommended way to trigger hipify is

Call hipify for the whole project files.

Call get_hipified_list() API to get updated hipified file list, if required.

All the functionality of hipify is handled within hipify-torch repo

jithunnair-amd · 2021-07-01T05:44:17Z

cmake/Hip.cmake

+  find_package(hip REQUIRED)
+
+  set(TP_HIP_INCLUDE ${ROCM_PATH}/include ${TP_HIP_INCLUDE})
+  set(TP_HIP_INCLUDE ${hip_INCLUDE_DIRS} $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}> $<INSTALL_INTERFACE:include> ${TP_HIP_INCLUDE})


IIUC, the only reason for this entire file to exist is to be able to provide ${hip_INCLUDE_DIRS}? If so, why don't we just use ${HIP_PATH}/include? It'd reduce a lot of the seemingly-unrelated code here. @jeffdaily for comment

jithunnair-amd · 2021-07-01T05:59:54Z

cmake/Hipify.cmake

@@ -0,0 +1,62 @@
+# Copyright (c) Facebook, Inc. and its affiliates.


These functions seem generic enough to be valuable as a part of hipify-torch itself, since any CMake-based hipify flow would likely need these functions. Can we move them to a CMake file in hipify-torch and include that CMake file here?

yes it is moved to hipify-torch repo

jithunnair-amd · 2021-07-01T06:07:31Z

tensorpipe/CMakeLists.txt

-    channel/cuda_xth/factory.cc)
-  list(APPEND TP_CUDA_PUBLIC_HDRS
-    channel/cuda_xth/factory.h)
+  tp_conditional_backend(


Nit: Can we use uppercase for this macro everywhere?

Updated to uppercase.
@lw
Please let me know if it is breaking any convention followed in tensorpipe. I checked the pyTorch code for any hint, but there is no convention followed there.

mrshenli

This looks good to me, but @beauby is much more familiar with the TensorPipe code base and might offer better reviews.

One general question, how do we test this works for ROCm devices?

mrshenli · 2021-07-21T00:13:44Z

tensorpipe/CMakeLists.txt

+    list(APPEND TP_CUDA_INCLUDE_DIRS ${CUDA_INCLUDE_DIRS})
+  elseif (TP_USE_ROCM)
+    set(TP_GPU_LIB_NAME "tensorpipe_hip")
+    # Finding of HIP package is already before hipifying the files


Curious, any reason not looking for packages here as existing code did for CUDA?

Hip packages were already found at - https://github.com/pytorch/tensorpipe/pull/401/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR26

mrshenli · 2021-07-21T00:23:43Z

tensorpipe/CMakeLists.txt

+    list(APPEND TP_CUDA_LINK_LIBRARIES ${TP_HIP_HCC_LIBRARIES})
+    list(APPEND TP_CUDA_INCLUDE_DIRS ${TP_HIP_INCLUDE})


Regarding the naming, any reason they don't follow the CUDA ones, i.e., HIP_LIBRARIES and HIP_INCLUDE_DIRS?

Let me check with the HIP team, if there is any particular reason for keeping this name, i.e., hip_INCLUDE_DIRS or different with CUDA, and get back if any reason.

mrshenli · 2021-07-21T00:25:23Z

tensorpipe/test/CMakeLists.txt

-  list(APPEND TP_TEST_SRCS
-    channel/cuda_gdr/cuda_gdr_test.cc
-    )
+  if((TP_ENABLE_CUDA_GDR AND TP_USE_CUDA) OR (TP_ENABLE_HIP_GDR AND TP_USE_ROCM))


@beauby do you know why we don't need this if-clause in the existing code?

At the moment we manually exclude Gdr tests from our CI, but it would not hurt to avoid building the tests if support for CudaGdr is not built.

pruthvistony · 2021-07-21T16:56:33Z

This looks good to me, but @beauby is much more familiar with the TensorPipe code base and might offer better reviews.

One general question, how do we test this works for ROCm devices?

As soon as we are able to get the tensorpipe building and running for ROCm on AMD GPU, we will set up a CI job to do this. PyTorch CI for ROCm runs on a good pool of CI machines. Currently checking if it can be used for Tensorpipe.

beauby · 2021-07-26T12:14:23Z

CMakeLists.txt

@@ -10,7 +10,7 @@ project(tensorpipe LANGUAGES C CXX)

 set(CMAKE_CXX_STANDARD 14)

-list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
+list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake" "${PROJECT_SOURCE_DIR}/third_party/hipify/cmake")


Same remark as https://github.com/pytorch/tensorpipe/pull/398/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR24.

@beauby Added a comment. Should we see if we can guard it with TP_USE_ROCM as well?

beauby · 2021-07-26T12:16:42Z

cmake/Options.cmake

+
+# if both TP_USE_CUDA and TP_USE_ROCM is set then break
+if(TP_USE_CUDA AND TP_USE_ROCM)
+  message(FATAL_ERROR "Tensorpipe can be built either for CUDA or ROCm, TP_USE_CUDA and TP_USE_ROCM both are set, erroring out!!!!")


Suggested modification:

"TensorPipe does not support building for CUDA and ROCM at the same time. Please unset either TP_USE_CUDA or TP_USE_ROCM."

beauby · 2021-07-26T12:17:30Z

tensorpipe/CMakeLists.txt

@@ -79,7 +79,7 @@ list(APPEND TP_PUBLIC_HDRS

 ### cma

-tp_conditional_backend(
+TP_CONDITIONAL_BACKEND(


So far we've been using this macro lower case, let's keep it consistent (i.e. tp_conditional_backend()).

@beauby Sorry, that was I who suggested changing it to uppercase, to keep it consistent with the case in Options.cmake: https://github.com/pytorch/tensorpipe/blob/master/cmake/Options.cmake#L13
Would you still like us to change it back to lowercase?

beauby · 2021-07-26T12:17:38Z

tensorpipe/CMakeLists.txt

@@ -124,7 +125,7 @@ list(APPEND TP_LINK_LIBRARIES uv::uv)

 ### shm

-tp_conditional_backend(
+TP_CONDITIONAL_BACKEND(


Same as above.

beauby · 2021-07-26T12:17:44Z

tensorpipe/CMakeLists.txt

@@ -143,7 +144,7 @@ endif()

 ### ibv

-tp_conditional_backend(
+TP_CONDITIONAL_BACKEND(


Same as above.

beauby · 2021-07-26T12:18:26Z

tensorpipe/CMakeLists.txt

-    channel/cuda_xth/factory.cc)
-  list(APPEND TP_CUDA_PUBLIC_HDRS
-    channel/cuda_xth/factory.h)
+  TP_CONDITIONAL_BACKEND(


Same remark regarding case.

beauby · 2021-07-26T12:18:40Z

tensorpipe/CMakeLists.txt

@@ -265,9 +278,11 @@ if(TP_USE_CUDA)

  ### cuda_ipc

-  tp_conditional_backend(
+  TP_CONDITIONAL_BACKEND(


beauby · 2021-07-26T12:18:46Z

tensorpipe/CMakeLists.txt

    TP_ENABLE_CUDA_IPC "Enable CUDA inter-process communication channel" "TP_USE_CUDA")
-  if(TP_ENABLE_CUDA_IPC)
+  TP_CONDITIONAL_BACKEND(


beauby · 2021-07-26T12:19:04Z

tensorpipe/CMakeLists.txt

@@ -279,9 +294,11 @@ if(TP_USE_CUDA)

  ### cuda_gdr

-  tp_conditional_backend(
+  TP_CONDITIONAL_BACKEND(


beauby · 2021-07-26T12:19:10Z

tensorpipe/CMakeLists.txt

    TP_ENABLE_CUDA_GDR "Enable CUDA GpuDirect (InfiniBand) channel" "LINUX")
-  if(TP_ENABLE_CUDA_GDR)
+  TP_CONDITIONAL_BACKEND(


beauby · 2021-07-26T12:22:43Z

Looks good apart from minor nits – are there some early results of it working on AMD GPUs?

Madouura · 2022-12-25T16:47:15Z

Is this and #398 still being worked on?

pruthvistony · 2022-12-25T17:11:17Z

Is this and #398 still being worked on?

We have tensorpipe with cuda_basic(hip_basic) working on ROCm (AMD gpus). We are in process to upstream all the changes currently and this is the initial PR.

[ROCm] Changes to enable build for ROCm platform

e21720a

facebook-github-bot added the cla signed label Jul 1, 2021

jithunnair-amd suggested changes Jul 1, 2021

View reviewed changes

pruthvistony added 3 commits July 19, 2021 12:42

Hipify related files moved to hipify-torch repo

d9f3bc6

Reordering of parameter to get_hipified_list function

5681a98

Nit update related to cmake macro

07ebeb9

mrshenli reviewed Jul 21, 2021

View reviewed changes

beauby reviewed Jul 26, 2021

View reviewed changes

jithunnair-amd and others added 5 commits July 28, 2021 15:38

Add comment

64174b8

Update fatal error message

a0d2347

Changes related to updated get_hipified_list() api

3880056

Adding guard to enable skip of not supportted HIP APIs

15b1bd8

Updates to use new hipify() cmake API

9d9bca1


		args = parser.parse_args()

		dict_file_name = args.dump_dict_directory + "/hipify_output_dict_dump.txt"

		@@ -0,0 +1,62 @@
		# Copyright (c) Facebook, Inc. and its affiliates.

		list(APPEND TP_CUDA_LINK_LIBRARIES ${TP_HIP_HCC_LIBRARIES})
		list(APPEND TP_CUDA_INCLUDE_DIRS ${TP_HIP_INCLUDE})

[ROCm] Changes to enable build for ROCm platform #401

Are you sure you want to change the base?

[ROCm] Changes to enable build for ROCm platform #401

Conversation

pruthvistony commented Jul 1, 2021 • edited Loading

pruthvistony commented Jul 1, 2021

pruthvistony commented Jul 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pruthvistony Jul 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrshenli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pruthvistony commented Jul 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beauby commented Jul 26, 2021

Madouura commented Dec 25, 2022

pruthvistony commented Dec 25, 2022

pruthvistony commented Jul 1, 2021 •

edited

Loading

pruthvistony Jul 19, 2021 •

edited

Loading