Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

cupy package mismatches with CUDA version in the docs #950

Open
serach24 opened this issue Aug 20, 2023 · 2 comments
Open

cupy package mismatches with CUDA version in the docs #950

serach24 opened this issue Aug 20, 2023 · 2 comments

Comments

@serach24
Copy link

serach24 commented Aug 20, 2023

Please describe the bug
Hi, according to the alpa installation doc, we need to pip3 install cupy-cuda11x to install cupy. However, when CUDA version is 11.1, according to cupy package info here and here, the correct command should be pip3 install cupy-cuda111, as using cupy-cuda11x will result in errors in the next step (python3 -c "from cupy.cuda import nccl").

Please describe the expected behavior
python3 -c "from cupy.cuda import nccl" does not print any error.

System information and environment

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker): Linux Ubuntu 22.04
  • Python version: 3.8.10
  • CUDA version: 11.1
  • NCCL version: 2.7.8

To Reproduce
Steps to reproduce the behavior:
Follow the installation guide using CUDA 11.1 and cudnn 8.0.5.

@WelY1
Copy link

WelY1 commented Aug 22, 2023

Hi, I pip install cupy-cuda11x, and pip3 install alpa and jaxlib. But when I check the installation by python3 -m alpa.test_install, An error occurred:

ERROR: test_2_pipeline_parallel (__main__.InstallationTest)
-------------------------------------------------------------------
......
self._sender_tasks[sender_worker].append(
jax._src.traceback_util.UnfilteredStackTrace: KeyError: Actor(MeshHostWorker, d022bdfe8bc5c769e2ce7fa6302000000)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Have you ever encountered this problem?
Thanks!

@serach24
Copy link
Author

Hi, I pip install cupy-cuda11x, and pip3 install alpa and jaxlib. But when I check the installation by python3 -m alpa.test_install, An error occurred:

ERROR: test_2_pipeline_parallel (__main__.InstallationTest)
-------------------------------------------------------------------
......
self._sender_tasks[sender_worker].append(
jax._src.traceback_util.UnfilteredStackTrace: KeyError: Actor(MeshHostWorker, d022bdfe8bc5c769e2ce7fa6302000000)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Have you ever encountered this problem? Thanks!

I was not able to pass installation test 2 all the time due to different errors with various versions. You can file a new issue about the specific error you met, but I feel the project is not actively maintained.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants