Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: why does it take several hours to : Linking CXX shared library librccl.so when make rccl ? #1430

Open
Kyrienn opened this issue Nov 22, 2024 · 3 comments

Comments

@Kyrienn
Copy link

Kyrienn commented Nov 22, 2024

Problem Description

[Issue]: why does it take several hours to : Linking CXX shared library librccl.so when make rccl ?

Operating System

ubuntu-24.04

CPU

Intel(R) Core(TM) i7-14700K

GPU

2x AMD Radeon RX GPU 7900XT

ROCm Version

ROCm 6.2.3

ROCm Component

HIPCC

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@thananon
Copy link
Contributor

thananon commented Nov 22, 2024

Did you use ./install.sh script?
You can cut a lot of build time by using ./install -l, which will build RCCL only for your local GPU. Please make sure you use head of development branch as we just recently made a huge improvement in build time. For example, my build for local GPU is now ~2 minutes.

More info can be found with ./install.sh -h.

@Jadenxxx-l
Copy link

Did you use ./install.sh script? You can cut a lot of build time by using ./install -l, which will build RCCL only for your local GPU. Please make sure you use head of development branch as we just recently made a huge improvement in build time. For example, my build for local GPU is now ~2 minutes.

More info can be found with ./install.sh -h.

I have the same issue as well. I have tried using the ./install.sh -l script, and it did reduce some of the compile time, but the linking still takes around 10 minutes. You mentioned that there's been a recent improvement in build time; could you please let me know which specific commit this was implemented in? I'd like to give it a try. Thank you for your help!

@thananon
Copy link
Contributor

thananon commented Nov 26, 2024

the linking still takes around 10 minutes.

This is what we expected out of the current version for some GPU models. More build time optimization is our ongoing effort.

could you please let me know which specific commit this was implemented in?

There is a couple. The one I remembered is: #1371. The general idea is that we were building kernels for every combination possible, some has multiple unroll variations and that takes a lot of time to link them together. We identified some of the kernels that were unnecessary and remove them.

Another side note is that, if you are building for only a few collective ops, you can select to only rebuild those ops instead of all of them. You can also rebuild only for one datatype (say 32bits allreduce).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants