Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence_Batching #533

Open
rizwanishaq opened this issue Jan 24, 2024 · 6 comments
Open

Sequence_Batching #533

rizwanishaq opened this issue Jan 24, 2024 · 6 comments

Comments

@rizwanishaq
Copy link

I am checking with sequence_batching, and
sequence_batching{
max_sequence_idle_microseconds: 5000000
oldest {
max_candidate_sequences: 1024
max_queue_delay_microseconds: 5000
}

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

@csukuangfj
Copy link
Collaborator

Could you show the code related to sequence_batching in sherpa?

@rizwanishaq
Copy link
Author

@csukuangfj
Copy link
Collaborator

@yuekaizhang Could you have a look at this issue?

@yuekaizhang
Copy link
Collaborator

I am checking with sequence_batching, and sequence_batching{ max_sequence_idle_microseconds: 5000000 oldest { max_candidate_sequences: 1024 max_queue_delay_microseconds: 5000 }

why we have 1024 max_candidate_sequences, if we use direct() isn't going to be much faster??

We didn't tune the max_candidate_sequences here, it's just a random choice. Could you please explain why direct() would be much faster? We didn't try direct() yet. It would be great if direct() could speed up. @rizwanishaq

@rizwanishaq
Copy link
Author

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

@yuekaizhang
Copy link
Collaborator

yuekaizhang commented Feb 18, 2024

@yuekaizhang I have tried both direct and with oldest, and for stream application direct is much better, as my stream app is working on each 10msec. I only have one issue, don't know how to solve that, it is that when max_sequence_idle_microseconds: 5000000 this occur for me there is no way, how to trigger this inside the model, or any other way?

@rizwanishaq Would you mind claring the questions? The max_sequence_idle_microseconds means: if "max_sequence_idle_microsseconds" is exceeded, the inference server will free the sequence slot allocated by the sequence by just discarded it.

That would be great if direct() could be better. I would appreciate it if you have some spare time to attach some perf results between direct() and oldest() similar like this #306 (comment). That would be useful for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants