Sequence_Batching #533

rizwanishaq · 2024-01-24T09:01:34Z

I am checking with sequence_batching, and
sequence_batching{
max_sequence_idle_microseconds: 5000000
oldest {
max_candidate_sequences: 1024
max_queue_delay_microseconds: 5000
}

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

csukuangfj · 2024-01-24T09:02:54Z

Could you show the code related to sequence_batching in sherpa?

rizwanishaq · 2024-01-24T10:37:39Z

the code is the config.txt file here https://github.com/k2-fsa/sherpa/blob/master/triton/model_repo_streaming/feature_extractor/config.pbtxt.template

csukuangfj · 2024-01-29T02:29:19Z

@yuekaizhang Could you have a look at this issue?

yuekaizhang · 2024-01-29T02:53:35Z

I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

We didn't tune the max_candidate_sequences here, it's just a random choice. Could you please explain why direct() would be much faster? We didn't try direct() yet. It would be great if direct() could speed up. @rizwanishaq

rizwanishaq · 2024-02-09T15:43:27Z

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

yuekaizhang · 2024-02-18T06:09:46Z

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

@rizwanishaq Would you mind claring the questions? The max_sequence_idle_microseconds means: if "max_sequence_idle_microsseconds" is exceeded, the inference server will free the sequence slot allocated by the sequence by just discarded it.

That would be great if direct() could be better. I would appreciate it if you have some spare time to attach some perf results between direct() and oldest() similar like this #306 (comment). That would be useful for us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence_Batching #533

Sequence_Batching #533

rizwanishaq commented Jan 24, 2024

csukuangfj commented Jan 24, 2024

rizwanishaq commented Jan 24, 2024

csukuangfj commented Jan 29, 2024

yuekaizhang commented Jan 29, 2024

rizwanishaq commented Feb 9, 2024

yuekaizhang commented Feb 18, 2024 •

edited

Loading

Sequence_Batching #533

Sequence_Batching #533

Comments

rizwanishaq commented Jan 24, 2024

csukuangfj commented Jan 24, 2024

rizwanishaq commented Jan 24, 2024

csukuangfj commented Jan 29, 2024

yuekaizhang commented Jan 29, 2024

rizwanishaq commented Feb 9, 2024

yuekaizhang commented Feb 18, 2024 • edited Loading

yuekaizhang commented Feb 18, 2024 •

edited

Loading