Unable to Train on Multiple GPUs(4070 ti + 4070) with YOLOv10n #16259

monggus525 · 2024-09-13T03:27:50Z

monggus525
Sep 13, 2024

I am attempting to train YOLOv10n detection using 4070 Ti (GPU0) and 4070 (GPU1) together. When I specify device=0,1 in the training command, I encounter the following error message:

RuntimeError: use_libuv was requested but PyTorch was built without libuv support
subprocess.CalledProcessError: Command '['C:\\Users\\MAI\\anaconda3\\envs\\yolov8\\python.exe', '-m', 'torch.distributed.run', '--nproc_per_node', '2', '--master_port', '65222', 'C:\\Users\\MAI\\AppData\\Roaming\\Ultralytics\\DDP\\_temp_opwapjid2112633383520.py']' returned non-zero exit status 1.

My training code was:

yolo train model=yolov10n.pt device=0,1 epochs=100 data=dataset_fixed.yaml imgsz=1600 batch=4

I have CUDA 12.4 installed, PyTorch version 2.4.1, and the Ultralytics YOLOv8 environment set up. Despite having the latest versions, I still face this issue. Based on GPT's suggestion, I also installed pyuv, but the problem persists. Additionally, if I don't specify device=0,1 and just run the training, it only uses device=0 (single GPU), rather than utilizing both GPUs.

Is it expected to run into errors when using GPUs with different models (in this case: 4070 Ti + 4070) for multi-GPU training? If different GPU models cannot be used, should I use identical GPUs (e.g., 4070 Ti + 4070 Ti or 4070 + 4070)?

If different GPU models are supported, how can I resolve this issue and successfully train on both GPUs?

glenn-jocher · 2024-09-13T06:45:19Z

glenn-jocher
Sep 13, 2024
Maintainer

@monggus525 I'd strongly recommend against training DDP with different GPU models.

0 replies

UltralyticsAssistant · 2024-09-13T08:30:39Z

UltralyticsAssistant
Sep 13, 2024
Maintainer

👋 Hello @monggus525, thank you for reaching out and for using Ultralytics 🚀! This is an automated response, but rest assured, an Ultralytics engineer will assist you soon.

To address your issue with multi-GPU training, let's tackle it step by step.

Understanding the Error

The error message you're experiencing, RuntimeError: use_libuv was requested but PyTorch was built without libuv support, suggests that PyTorch wasn't built with libuv support required for distributed training. This may be contributing to the issue alongside potential compatibility concerns between the different GPU models.

Solutions to Consider

Reinstall PyTorch with Libuv Support:
- Ensure your PyTorch version is built with libuv support. You can check PyTorch releases that might have improved multi-GPU support via PyTorch Installation Guide.
Using Compatible GPUs:
- While using different GPU models (e.g., 4070 Ti and 4070) is generally supported, there can sometimes be compatibility issues. Testing with identical GPUs could help determine if that's the source of the problem.
Validate CUDA and PyTorch Installation:
- Confirm that your CUDA and PyTorch versions are compatible. The torch.cuda.is_available() should return True for both GPUs. You might want to check the compatibility chart on the PyTorch website.
Modify the Training Command:
- Try initiating the training with a simpler command and gradually incorporate more parameters to isolate the issue.
```
yolo train model=yolov10n.pt device=0,1 data=dataset_fixed.yaml
```
Test Distributed Training Configuration:
- Verify that your PyTorch distributed setting is correctly configured. You could use the torch.distributed.launch command directly to debug:
```
python -m torch.distributed.launch --nproc_per_node=2 --master_port=65222 train.py --model yolov10n.pt --data dataset_fixed.yaml
```

Additional Support

Join our community to get real-time support:

Discord 🎧 for chat support.
Discourse for detailed topic discussions.
Subreddit to engage with community members.

Feel free to provide additional error logs or configuration details to help us assist further. We're here to help! 😊

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Unable to Train on Multiple GPUs(4070 ti + 4070) with YOLOv10n #16259

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ultralytics

Unable to Train on Multiple GPUs(4070 ti + 4070) with YOLOv10n #16259

monggus525 Sep 13, 2024

Replies: 2 comments

glenn-jocher Sep 13, 2024 Maintainer

UltralyticsAssistant Sep 13, 2024 Maintainer

Understanding the Error

Solutions to Consider

Additional Support

monggus525
Sep 13, 2024

glenn-jocher
Sep 13, 2024
Maintainer

UltralyticsAssistant
Sep 13, 2024
Maintainer