Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has offline zipformer TensorRT been supported? #637

Open
Vergissmeinicht opened this issue Aug 20, 2024 · 14 comments
Open

Has offline zipformer TensorRT been supported? #637

Vergissmeinicht opened this issue Aug 20, 2024 · 14 comments

Comments

@Vergissmeinicht
Copy link

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts
Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@csukuangfj
Copy link
Collaborator

@yuekaizhang Could you have a look?

@yuekaizhang
Copy link
Collaborator

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

@Vergissmeinicht
Copy link
Author

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

Thanks! FYI, I've tried the onnx model from (https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-gigaspeech-2023-12-12-english) to do onnx export and trtexec, but trtexec fails while parsing softmax op with 1-d input. Then I try onnx-graphsurgeon to fix this 1-d input problem, but trtexec still fails with if-conditional outputs which comes from CompactRelPositionalEncoding.

@yuekaizhang
Copy link
Collaborator

@Vergissmeinicht
Copy link
Author

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me.
But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@yuekaizhang
Copy link
Collaborator

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

@Vergissmeinicht
Copy link
Author

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

@yuekaizhang
Copy link
Collaborator

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

@Vergissmeinicht
Copy link
Author

build_wenetspeech_zipformer_offline_trt.sh

I use the model downloaded from https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh#L47C5-L47C110.
The docker I use is soar97/triton-k2:24.07.

@Vergissmeinicht
Copy link
Author

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different.
log.txt

@Vergissmeinicht
Copy link
Author

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?
Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different. log.txt

@yuekaizhang Hi, is there any progress on this problem? Appreciate your reply.

@yuekaizhang
Copy link
Collaborator

@Vergissmeinicht Sorry for the late reply, I am OOO past days. Would you mind trying https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37? Or you could set a smaller opt and max shape, with shorter seq_len and batch_size range.

@jingzhaoou
Copy link

I also tried to follow the latest tutorial to run build_wenetspeech_zipformer_offline_trt.sh on a H100 GPU. The docker I use is soar97/triton-k2:24.07. I ran into the following error when I started the Triton inference server:

+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                                          |
+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+
| decoder           | 1       | READY                                                                                                           |
| encoder           | 1       | READY                                                                                                           |
| feature_extractor | 1       | READY                                                                                                           |
| joiner            | 1       | READY                                                                                                           |
| scorer            | 1       | UNAVAILABLE: Internal: ValueError: invalid literal for int() with base 10: 'https://git-lfs.github.com/spec/v1' |
|                   |         |                                                                                                                 |
|                   |         | At:                                                                                                             |
|                   |         |   /usr/local/lib/python3.10/dist-packages/k2/symbol_table.py(98): from_str                                      |
|                   |         |   /usr/local/lib/python3.10/dist-packages/k2/symbol_table.py(131): from_file                                    |
|                   |         |   /workspace/icefall/icefall/lexicon.py(165): __init__                                                          |
|                   |         |   /workspace/sherpa/triton/./model_repo_offline/scorer/1/model.py(80): init_parameters                          |
|                   |         |   /workspace/sherpa/triton/./model_repo_offline/scorer/1/model.py(64): initialize                               |
+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+

The error should come from this line at model.py.

The Hugging Face mode only has data/lang_char. So, I tried the following Python code

from icefall.lexicon import Lexicon
tokenizer_file = './model_repo_offline/scorer/lang_char' 
lexicon = Lexicon(tokenizer_file)

which failed with the same error when I ran the Triton inference code.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/icefall/icefall/lexicon.py", line 165, in __init__
    self.word_table = k2.SymbolTable.from_file(lang_dir / "words.txt")
  File "/usr/local/lib/python3.10/dist-packages/k2/symbol_table.py", line 131, in from_file
    return SymbolTable.from_str(f.read().strip())
  File "/usr/local/lib/python3.10/dist-packages/k2/symbol_table.py", line 98, in from_str
    sym, idx = fields[0], int(fields[1])
ValueError: invalid literal for int() with base 10: 'https://git-lfs.github.com/spec/v1'

Any help would be highly appreciated.
Jingzhao

@jingzhaoou
Copy link

I figured out why I ran into the above errors. build_wenetspeech_zipformer_offline_trt.sh checked out the Git repo using

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/pkufool/icefall-asr-zipformer-wenetspeech-20230615

Some files under lang_char were not checked out using git lfs. I simply changed GIT_LFS_SKIP_SMUDGE=0 to check out all the Git LFS files, which resolved the above issue.

I then ran into another error when starting the Triton server:

E0927 07:29:52.298433 426 model_repository_manager.cc:614] "Invalid argument: in ensemble transducer, ensemble tensor WAV: inconsistent data type: TYPE_FP16 is inferred from model transducer while TYPE_FP32 is inferred from model feature_extractor"

I changed the feature_extractor/config.pbtxt to look like the following

input [
  {
    name: "wav"
    data_type: TYPE_FP16

Now the Triton server is running. I still need to see how to verify whether it can generate correct transcripts. Fingers crossed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants