AMD Ryzen 7 4700U inference performance CPU vs iGPU #3422

knoppmyth · 2023-06-27T21:55:25Z

knoppmyth
Jun 27, 2023

Thought I'd share this to show the performance difference between using the CPU and iGPU with ROCm. The OS is Arch Linux. the required software was installed in a virtual environment. The hardware is Asus PN50 with 16 GB of RAM.

PyTorch was installed in the venv with:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
Followed by ultralytics with
pip install ultralytics
Before inferencing:
export HSA_OVERRIDE_GFX_VERSION=9.0.0
Inferenced with:
yolo predict model=~/rocm/best.pt source=~/rocm/foo/
Here is the output with 1 GB dedicated to the iGPU (Note iGPU is first followed by the CPU):

Ultralytics YOLOv8.0.121 🚀 Python-3.11.3 torch-2.0.1+rocm5.4.2 CUDA:0 (AMD Radeon Graphics, 1024MiB)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients

image 1/20 /home/km/rocm/foo/foo_01.jpg: 640x448 3 foos, 66.3ms
image 2/20 /home/km/rocm/foo/foo_02.jpg: 448x640 2 foos, 66.6ms
image 3/20 /home/km/rocm/foo/foo_03.jpg: 608x640 2 foos, 72.4ms
image 4/20 /home/km/rocm/foo/foo_04.jpg: 448x640 1 foo, 43.5ms
image 5/20 /home/km/rocm/foo/foo_05.jpg: 448x640 1 foo, 42.4ms
image 6/20 /home/km/rocm/foo/foo_06.jpg: 384x640 1 foo, 66.2ms
image 7/20 /home/km/rocm/foo/foo_07.jpg: 384x640 1 foo, 37.0ms
image 8/20 /home/km/rocm/foo/foo_08.jpg: 640x640 3 foos, 60.9ms
image 9/20 /home/km/rocm/foo/foo_09.jpg: 640x480 4 foos, 69.3ms
image 10/20 /home/km/rocm/foo/foo_10.jpg: 640x480 1 foo, 44.2ms
image 11/20 /home/km/rocm/foo/foo_11.jpg: 480x640 1 foo, 66.1ms
image 12/20 /home/km/rocm/foo/foo_12.jpg: 640x512 3 foos, 67.8ms
image 13/20 /home/km/rocm/foo/foo_13.jpg: 640x640 6 foos, 60.0ms
image 14/20 /home/km/rocm/foo/foo_14.jpg: 640x544 (no detections), 71.0ms
image 15/20 /home/km/rocm/foo/foo_15.jpg: 640x640 4 foos, 58.8ms
image 16/20 /home/km/rocm/foo/foo_16.jpg: 384x640 2 foos, 38.0ms
image 17/20 /home/km/rocm/foo/foo_17.jpg: 448x640 4 foos, 45.7ms
image 18/20 /home/km/rocm/foo/foo_18.jpg: 640x640 2 foos, 57.6ms
image 19/20 /home/km/rocm/foo/foo_19.jpg: 640x512 4 foos, 45.7ms
image 20/20 /home/km/rocm/foo/foo_20.jpg: 448x640 1 foo, 43.0ms
Speed: 133.9ms preprocess, 56.1ms inference, 16.8ms postprocess per image at shape (1, 3, 640, 640)

Ultralytics YOLOv8.0.121 🚀 Python-3.11.3 torch-2.0.1+rocm5.4.2 CPU
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients

image 1/20 /home/km/rocm/foo/foo_01.jpg: 640x448 3 foos, 168.6ms
image 2/20 /home/km/rocm/foo/foo_02.jpg: 448x640 2 foos, 119.6ms
image 3/20 /home/km/rocm/foo/foo_03.jpg: 608x640 2 foos, 131.7ms
image 4/20 /home/km/rocm/foo/foo_04.jpg: 448x640 1 foo, 96.4ms
image 5/20 /home/km/rocm/foo/foo_05.jpg: 448x640 1 foo, 117.5ms
image 6/20 /home/km/rocm/foo/foo_06.jpg: 384x640 1 foo, 113.4ms
image 7/20 /home/km/rocm/foo/foo_07.jpg: 384x640 1 foo, 85.9ms
image 8/20 /home/km/rocm/foo/foo_08.jpg: 640x640 3 foos, 138.4ms
image 9/20 /home/km/rocm/foo/foo_09.jpg: 640x480 4 foos, 124.1ms
image 10/20 /home/km/rocm/foo/foo_10.jpg: 640x480 1 foo, 104.0ms
image 11/20 /home/km/rocm/foo/foo_11.jpg: 480x640 1 foo, 121.6ms
image 12/20 /home/km/rocm/foo/foo_12.jpg: 640x512 3 foos, 125.2ms
image 13/20 /home/km/rocm/foo/foo_13.jpg: 640x640 6 foos, 169.2ms
image 14/20 /home/km/rocm/foo/foo_14.jpg: 640x544 (no detections), 121.6ms
image 15/20 /home/km/rocm/foo/foo_15.jpg: 640x640 4 foos, 126.4ms
image 16/20 /home/km/rocm/foo/foo_16.jpg: 384x640 2 foos, 81.3ms
image 17/20 /home/km/rocm/foo/foo_17.jpg: 448x640 4 foos, 108.7ms
image 18/20 /home/km/rocm/foo/foo_18.jpg: 640x640 2 foos, 132.5ms
image 19/20 /home/km/rocm/foo/foo_19.jpg: 640x512 4 foos, 110.6ms
image 20/20 /home/km/rocm/foo/foo_20.jpg: 448x640 1 foo, 93.9ms
Speed: 3.6ms preprocess, 119.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

Here is the performance with 4G dedicated to the iGPU:

Ultralytics YOLOv8.0.121 🚀 Python-3.11.3 torch-2.0.1+rocm5.4.2 CUDA:0 (AMD Radeon Graphics, 4096MiB)
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients

image 1/20 /home/km/rocm/foo/foo_01.jpg: 640x448 3 foos, 65.6ms
image 2/20 /home/km/rocm/foo/foo_02.jpg: 448x640 2 foos, 66.6ms
image 3/20 /home/km/rocm/foo/foo_03.jpg: 608x640 2 foos, 69.9ms
image 4/20 /home/km/rocm/foo/foo_04.jpg: 448x640 1 foo, 43.7ms
image 5/20 /home/km/rocm/foo/foo_05.jpg: 448x640 1 foo, 42.1ms
image 6/20 /home/km/rocm/foo/foo_06.jpg: 384x640 1 foo, 67.8ms
image 7/20 /home/km/rocm/foo/foo_07.jpg: 384x640 1 foo, 37.1ms
image 8/20 /home/km/rocm/foo/foo_08.jpg: 640x640 3 foos, 58.7ms
image 9/20 /home/km/rocm/foo/foo_09.jpg: 640x480 4 foos, 69.1ms
image 10/20 /home/km/rocm/foo/foo_10.jpg: 640x480 1 foo, 45.1ms
image 11/20 /home/km/rocm/foo/foo_11.jpg: 480x640 1 foo, 64.6ms
image 12/20 /home/km/rocm/foo/foo_12.jpg: 640x512 3 foos, 68.2ms
image 13/20 /home/km/rocm/foo/foo_13.jpg: 640x640 6 foos, 58.4ms
image 14/20 /home/km/rocm/foo/foo_14.jpg: 640x544 (no detections), 71.6ms
image 15/20 /home/km/rocm/foo/foo_15.jpg: 640x640 4 foos, 60.8ms
image 16/20 /home/km/rocm/foo/foo_16.jpg: 384x640 2 foos, 37.4ms
image 17/20 /home/km/rocm/foo/foo_17.jpg: 448x640 4 foos, 42.8ms
image 18/20 /home/km/rocm/foo/foo_18.jpg: 640x640 2 foos, 57.2ms
image 19/20 /home/km/rocm/foo/foo_19.jpg: 640x512 4 foos, 45.6ms
image 20/20 /home/km/rocm/foo/foo_20.jpg: 448x640 1 foo, 45.9ms


Ultralytics YOLOv8.0.121 🚀 Python-3.11.3 torch-2.0.1+rocm5.4.2 CPU
Model summary (fused): 168 layers, 11125971 parameters, 0 gradients

image 1/20 /home/km/rocm/foo/foo_01.jpg: 640x448 3 foos, 175.4ms
image 2/20 /home/km/rocm/foo/foo_02.jpg: 448x640 2 foos, 120.9ms
image 3/20 /home/km/rocm/foo/foo_03.jpg: 608x640 2 foos, 133.1ms
image 4/20 /home/km/rocm/foo/foo_04.jpg: 448x640 1 foo, 93.8ms
image 5/20 /home/km/rocm/foo/foo_05.jpg: 448x640 1 foo, 97.6ms
image 6/20 /home/km/rocm/foo/foo_06.jpg: 384x640 1 foo, 116.0ms
image 7/20 /home/km/rocm/foo/foo_07.jpg: 384x640 1 foo, 95.2ms
image 8/20 /home/km/rocm/foo/foo_08.jpg: 640x640 3 foos, 139.4ms
image 9/20 /home/km/rocm/foo/foo_09.jpg: 640x480 4 foos, 127.0ms
image 10/20 /home/km/rocm/foo/foo_10.jpg: 640x480 1 foo, 117.0ms
image 11/20 /home/km/rocm/foo/foo_11.jpg: 480x640 1 foo, 120.7ms
image 12/20 /home/km/rocm/foo/foo_12.jpg: 640x512 3 foos, 122.8ms
image 13/20 /home/km/rocm/foo/foo_13.jpg: 640x640 6 foos, 152.8ms
image 14/20 /home/km/rocm/foo/foo_14.jpg: 640x544 (no detections), 120.9ms
image 15/20 /home/km/rocm/foo/foo_15.jpg: 640x640 4 foos, 135.3ms
image 16/20 /home/km/rocm/foo/foo_16.jpg: 384x640 2 foos, 83.8ms
image 17/20 /home/km/rocm/foo/foo_17.jpg: 448x640 4 foos, 97.7ms
image 18/20 /home/km/rocm/foo/foo_18.jpg: 640x640 2 foos, 168.8ms
image 19/20 /home/km/rocm/foo/foo_19.jpg: 640x512 4 foos, 110.5ms
image 20/20 /home/km/rocm/foo/foo_20.jpg: 448x640 1 foo, 98.1ms
Speed: 2.9ms preprocess, 121.3ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

Unfortunately, I won't be able to do any follow-up performance testing as the system wasn't stable (kept locking up) so I returned it (Yes, RAM was tested). I may get a minisforum to continue testing with ROCm.

This was a custom object detector. As I don't want what I'm detecting known, it was replaced with "foo".

glenn-jocher · 2024-02-06T09:59:20Z

glenn-jocher
Feb 6, 2024
Maintainer

@knoppmyth thank you for sharing your inference performance results between the CPU and iGPU on the AMD Ryzen 7 4700U using ROCm. It's interesting to see the comparison, and your detailed report could be valuable for others in the community who are considering similar hardware setups.

From your results, it's evident that the iGPU provides a significant speedup over the CPU for inference tasks with YOLOv8, even when only 1 GB is dedicated to the iGPU. This aligns with the general expectation that GPUs, including integrated ones, can offer better parallel processing capabilities for deep learning inference compared to CPUs.

It's unfortunate to hear about the system stability issues you encountered. Hardware stability is crucial for consistent performance testing and deployment. If you decide to continue testing with a different system in the future, the community would surely benefit from any additional insights you can provide.

Your contribution to the YOLOv8 community is appreciated, and we encourage you to share any further findings or questions you might have. Remember, the documentation at https://docs.ultralytics.com is always there to assist you with any additional information you might need regarding the use of YOLOv8. Good luck with your future testing, and we hope to see more from you! 🚀🤖

0 replies

knoppmyth · 2024-02-07T08:09:57Z

knoppmyth
Feb 7, 2024
Author

@glenn-jocher You're welcome. I never got the minisforum system. But when I get a new system, I do intend to share my results.

1 reply

glenn-jocher Feb 7, 2024
Maintainer

@knoppmyth thanks for sharing your findings and intentions to continue contributing to the community with future results. Your efforts are appreciated, and we look forward to seeing more of your work with new systems. If you have any questions or need support with Ultralytics YOLOv8, feel free to reach out. Good luck with your next system! 👍🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

AMD Ryzen 7 4700U inference performance CPU vs iGPU #3422

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Ultralytics

AMD Ryzen 7 4700U inference performance CPU vs iGPU #3422

knoppmyth Jun 27, 2023

Replies: 2 comments · 1 reply

glenn-jocher Feb 6, 2024 Maintainer

knoppmyth Feb 7, 2024 Author

glenn-jocher Feb 7, 2024 Maintainer

knoppmyth
Jun 27, 2023

Replies: 2 comments 1 reply

glenn-jocher
Feb 6, 2024
Maintainer

knoppmyth
Feb 7, 2024
Author

glenn-jocher Feb 7, 2024
Maintainer