-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Degradation in YOLOv8s Model Exported to ONNX via SparseML's Exporter #2276
Comments
Model exported: https://drive.google.com/file/d/1ZDlRd6c1X05lrnxRThUo8FxuapS5Kgm7/view?usp=sharing You can see that this style of Conv is not being folded to a ConvInteger correctly - @bfineran |
@mgoin we'll need to take a look at the recipe and its application - conv integer requires two quantized inputs (weight and act) to the Conv, here we see a quantize (weight) input and the output being quantized (although this may be the input quantization to another layer) |
@bfineran Thank you for great work :) Wanted to let you know that I exactly am having the same performance degradation as @rsazizov on yolov8n from Throughput (items/sec): 110.0278 (on sparsezoo-yolov8n) to Throughput (items/sec): 15.5770 after converting the sparsezoo-yolov8n |
Hi @imAhmadAsghar we're aware of the issue and are looking into it internally - it doesn't seem to be a version compatibility issue, but you could potentially try rolling back your sparseml/pytorch versions. The issue seems to be that the model exports differently now at the beginning (a simple split node is not a few slices). |
@bfineran Thank you for your response. I actually did not get the last part of your response which is "The issue seems to be that the model exports differently now at the beginning (a simple split node is not a few slices)." Can you please explain what do you mean by that in detail, if possible? I am not a performance/optimization engineer and I just want to use sparseml/deepsparse to speed up the inference on CPU. However, the whole library is inconvenient and super foggy. I have tested the following:
And here are the results: Performance test between pruned vs pruned and quantized model: Right now, I am super confused and it does not make any sense to use your library at all. I think I am lacking a lot of information regarding the whole process. Can you please provide me with the proper reference where to start because the one that is provided on the homepage is not leading me anywhere as you can see from the results. I would really love to get it run and achieve the results you promised. |
@imAhmadAsghar Hi, could you find a fix to this? What is going wrong with the exports? |
@yoloyash Hi, no I could not unfortunately. |
@rsazizov @imAhmadAsghar when Analyze it, has error of no weights @bfineran can you help with it? I guess it's either the recipe or the export that caused this problem |
@mydhui you could try exporting a non quantized FP32 model to see if the problematic slice node is still there around this conv. Additionally, you could skip this conv during quantization to export a runnable model |
Hi has anyone managed to find versions of libraries where quantization does not break the model trained on custom dataset? |
Describe the bug
When exporting the YOLOv8s (pruned50-quant, model.pt from sparsezoo) model via the ONNX exporter (sparseml.ultralytics.export_onnx), its performance noticeably decreases compared to the ONNX model available in SparseZoo
Expected behavior
Perfomance of the two ONNX files should be the same, as it is the same model.
Environment
Include all relevant environment information:
To Reproduce
Exact steps to reproduce the behavior:
Download model.onnx for yolov8s-pruned50-quant from SparseZoo (https://sparsezoo.neuralmagic.com/models/yolov8-s-coco-pruned50_quantized). Benchmark it using deepsparse.benchmark:
Notice fraction_of_supported_ops: 1.0 and Throughput (items/sec): 87.1154.
Now download model.pt from the same page and export it to ONNX using the provided tool:
Conversion is successful. Now benchmark exported onnx model:
Notice fraction_of_supported_ops: 0.0 and Throughput (items/sec): 20.2886.
Throughput decreased from ~88 down to ~20 for the same model.
The text was updated successfully, but these errors were encountered: