Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential solution to RuntimeError: No such operator fbgemm::jagged_2d_to_dense #3168

Open
Jary-lrj opened this issue Sep 24, 2024 · 1 comment

Comments

@Jary-lrj
Copy link

My envs BEFORE:

  • in conda virtual env
  • NVIDIA Tesla V100
  • python: 3.10
  • torch 2.1.0 + cu118
  • cudnn 8.7.0
  • fbgemm_gpu 0.7.0

My envs AFTER:

  • in conda virtual env
  • NVIDIA Tesla V100
  • python: 3.10 -> 3.12
  • torch 2.1.0 + cu118 -> torch 2.3.0 + cu118 (Key)
  • cudnn 8.7.0
  • fbgemm_gpu 0.7.0

PS: When I use fbgemm_gpu 0.8.0, there will be another error: AttributeError: '_OpNamespace' 'fbgemm' object has no attribute 'merge_pooled_embeddings'. I have no idea why the later version has such an error.

Hint: If you find similar errors, check your configs by the following order:
(1) GPU device: My NVIDIA RTX 4090 can't work with the same config in envs AFTER. It seems only V and A devices can work.
(2) pytorch and cuda: If possible, you can try run fbgemm in conda virtual envs instead of docker / bare linux. CUDA 11.8 & 12.1 is recommended. AND USE torch 2.3.0+ NOT 2.1.0. As for libnvidia_ml.so, libtorch.so, no matter you use pip or conda to install torch, they will be installed.
(3) version: Try 0.7.0 but not 0.8.0.

@q10
Copy link
Contributor

q10 commented Sep 24, 2024

Hi @Jary-lrj as of time of writing, fbgemm_gpu 0.7.0 is old and no longer supported. Please consider switching over to 0.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants