-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Has offline zipformer TensorRT been supported? #637
Comments
@yuekaizhang Could you have a look? |
@Vergissmeinicht Not yet, let me do it and I will give update here. |
Thanks! FYI, I've tried the onnx model from (https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-gigaspeech-2023-12-12-english) to do onnx export and trtexec, but trtexec fails while parsing softmax op with 1-d input. Then I try onnx-graphsurgeon to fix this 1-d input problem, but trtexec still fails with if-conditional outputs which comes from CompactRelPositionalEncoding. |
@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427. |
It works for me. |
@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3. |
I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory? |
Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh? Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37 |
I use the model downloaded from https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh#L47C5-L47C110. |
Here's the building log. Maybe there's something different. |
@yuekaizhang Hi, is there any progress on this problem? Appreciate your reply. |
@Vergissmeinicht Sorry for the late reply, I am OOO past days. Would you mind trying https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37? Or you could set a smaller opt and max shape, with shorter seq_len and batch_size range. |
I also tried to follow the latest tutorial to run build_wenetspeech_zipformer_offline_trt.sh on a H100 GPU. The docker I use is
The error should come from this line at model.py. The Hugging Face mode only has
which failed with the same error when I ran the Triton inference code.
Any help would be highly appreciated. |
I figured out why I ran into the above errors.
Some files under I then ran into another error when starting the Triton server:
I changed the
Now the Triton server is running. I still need to see how to verify whether it can generate correct transcripts. Fingers crossed. |
https://github.com/k2-fsa/sherpa/tree/master/triton/scripts
Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?
The text was updated successfully, but these errors were encountered: