Has offline zipformer TensorRT been supported? #637

Vergissmeinicht · 2024-08-20T10:19:31Z

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts
Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

csukuangfj · 2024-08-20T10:20:37Z

@yuekaizhang Could you have a look?

yuekaizhang · 2024-08-20T10:26:55Z

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

Vergissmeinicht · 2024-08-20T10:36:01Z

https://github.com/k2-fsa/sherpa/tree/master/triton/scripts Have checked the scripts here but only conformer trt script (triton/scripts/build_librispeech_pruned_transducer_stateless3_offline_trt.sh) released. Is it ok for zipformer to do export-onnx -> trtexec to get tensorrt engine too?

@Vergissmeinicht Not yet, let me do it and I will give update here.

Thanks! FYI, I've tried the onnx model from (https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-gigaspeech-2023-12-12-english) to do onnx export and trtexec, but trtexec fails while parsing softmax op with 1-d input. Then I try onnx-graphsurgeon to fix this 1-d input problem, but trtexec still fails with if-conditional outputs which comes from CompactRelPositionalEncoding.

yuekaizhang · 2024-08-22T10:38:36Z

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

Vergissmeinicht · 2024-08-23T09:18:40Z

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me.
But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

yuekaizhang · 2024-08-23T09:26:39Z

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

Vergissmeinicht · 2024-08-27T03:18:51Z

@Vergissmeinicht Just comment the lines should be okay https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/zipformer.py#L1422-L1427.

It works for me. But when I try using trtexec to convert the zipformer onnx model from my teammate, it fails while parsing Slice node, saying that "This version of TensorRT does not supoort dynamic axes". Maybe my icefall version does not match his. Any solution to parse this Slice op?

@Vergissmeinicht Pleasae use the latest tensorrt e.g. trt 10.2 in tritonserver:24.07-py3.

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

yuekaizhang · 2024-08-28T02:29:01Z

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Vergissmeinicht · 2024-08-28T09:37:42Z

build_wenetspeech_zipformer_offline_trt.sh

I use the model downloaded from https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh#L47C5-L47C110.
The docker I use is soar97/triton-k2:24.07.

Vergissmeinicht · 2024-08-28T10:01:19Z

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?

Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different.
log.txt

Vergissmeinicht · 2024-09-10T07:30:04Z

I follow the latest turtorial to run build_wenetspeech_zipformer_offline_trt.sh. It fails due to oom where tactic device request 34024MB (my 4090ti has 24217MB available). Do you use other gpu with larger memory?

Are you using the larger model comparing with the model in build_wenetspeech_zipformer_offline_trt.sh?
Would you mind changing the option? https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37

Here's the building log. Maybe there's something different. log.txt

@yuekaizhang Hi, is there any progress on this problem? Appreciate your reply.

yuekaizhang · 2024-09-19T06:18:54Z

@Vergissmeinicht Sorry for the late reply, I am OOO past days. Would you mind trying https://github.com/NVIDIA/trt-samples-for-hackathon-cn/blob/master/cookbook/07-Tool/trtexec/Help.txt#L37? Or you could set a smaller opt and max shape, with shorter seq_len and batch_size range.

jingzhaoou · 2024-09-25T21:09:10Z

I also tried to follow the latest tutorial to run build_wenetspeech_zipformer_offline_trt.sh on a H100 GPU. The docker I use is soar97/triton-k2:24.07. I ran into the following error when I started the Triton inference server:

+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                                          |
+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+
| decoder           | 1       | READY                                                                                                           |
| encoder           | 1       | READY                                                                                                           |
| feature_extractor | 1       | READY                                                                                                           |
| joiner            | 1       | READY                                                                                                           |
| scorer            | 1       | UNAVAILABLE: Internal: ValueError: invalid literal for int() with base 10: 'https://git-lfs.github.com/spec/v1' |
|                   |         |                                                                                                                 |
|                   |         | At:                                                                                                             |
|                   |         |   /usr/local/lib/python3.10/dist-packages/k2/symbol_table.py(98): from_str                                      |
|                   |         |   /usr/local/lib/python3.10/dist-packages/k2/symbol_table.py(131): from_file                                    |
|                   |         |   /workspace/icefall/icefall/lexicon.py(165): __init__                                                          |
|                   |         |   /workspace/sherpa/triton/./model_repo_offline/scorer/1/model.py(80): init_parameters                          |
|                   |         |   /workspace/sherpa/triton/./model_repo_offline/scorer/1/model.py(64): initialize                               |
+-------------------+---------+-----------------------------------------------------------------------------------------------------------------+

The error should come from this line at model.py.

The Hugging Face mode only has data/lang_char. So, I tried the following Python code

from icefall.lexicon import Lexicon
tokenizer_file = './model_repo_offline/scorer/lang_char' 
lexicon = Lexicon(tokenizer_file)

which failed with the same error when I ran the Triton inference code.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/icefall/icefall/lexicon.py", line 165, in __init__
    self.word_table = k2.SymbolTable.from_file(lang_dir / "words.txt")
  File "/usr/local/lib/python3.10/dist-packages/k2/symbol_table.py", line 131, in from_file
    return SymbolTable.from_str(f.read().strip())
  File "/usr/local/lib/python3.10/dist-packages/k2/symbol_table.py", line 98, in from_str
    sym, idx = fields[0], int(fields[1])
ValueError: invalid literal for int() with base 10: 'https://git-lfs.github.com/spec/v1'

Any help would be highly appreciated.
Jingzhao

jingzhaoou · 2024-09-27T07:42:00Z

I figured out why I ran into the above errors. build_wenetspeech_zipformer_offline_trt.sh checked out the Git repo using

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/pkufool/icefall-asr-zipformer-wenetspeech-20230615

Some files under lang_char were not checked out using git lfs. I simply changed GIT_LFS_SKIP_SMUDGE=0 to check out all the Git LFS files, which resolved the above issue.

I then ran into another error when starting the Triton server:

E0927 07:29:52.298433 426 model_repository_manager.cc:614] "Invalid argument: in ensemble transducer, ensemble tensor WAV: inconsistent data type: TYPE_FP16 is inferred from model transducer while TYPE_FP32 is inferred from model feature_extractor"

I changed the feature_extractor/config.pbtxt to look like the following

input [
  {
    name: "wav"
    data_type: TYPE_FP16

Now the Triton server is running. I still need to see how to verify whether it can generate correct transcripts. Fingers crossed.

yuekaizhang mentioned this issue Aug 22, 2024

support zipformer2 offline triton recipe #639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has offline zipformer TensorRT been supported? #637

Has offline zipformer TensorRT been supported? #637

Vergissmeinicht commented Aug 20, 2024

csukuangfj commented Aug 20, 2024

yuekaizhang commented Aug 20, 2024

Vergissmeinicht commented Aug 20, 2024

yuekaizhang commented Aug 22, 2024

Vergissmeinicht commented Aug 23, 2024

yuekaizhang commented Aug 23, 2024

Vergissmeinicht commented Aug 27, 2024

yuekaizhang commented Aug 28, 2024

Vergissmeinicht commented Aug 28, 2024

Vergissmeinicht commented Aug 28, 2024

Vergissmeinicht commented Sep 10, 2024

yuekaizhang commented Sep 19, 2024

jingzhaoou commented Sep 25, 2024

jingzhaoou commented Sep 27, 2024

Has offline zipformer TensorRT been supported? #637

Has offline zipformer TensorRT been supported? #637

Comments

Vergissmeinicht commented Aug 20, 2024

csukuangfj commented Aug 20, 2024

yuekaizhang commented Aug 20, 2024

Vergissmeinicht commented Aug 20, 2024

yuekaizhang commented Aug 22, 2024

Vergissmeinicht commented Aug 23, 2024

yuekaizhang commented Aug 23, 2024

Vergissmeinicht commented Aug 27, 2024

yuekaizhang commented Aug 28, 2024

Vergissmeinicht commented Aug 28, 2024

Vergissmeinicht commented Aug 28, 2024

Vergissmeinicht commented Sep 10, 2024

yuekaizhang commented Sep 19, 2024

jingzhaoou commented Sep 25, 2024

jingzhaoou commented Sep 27, 2024