Batched inference doesn´t improve time on AWS machines #1089

pablex1912 · 2024-10-25T12:01:47Z

pablex1912
Oct 25, 2024

Hello everybody!

I'm using this library a lot in my project and it's being very useful, it's really fantastic.

I have observed a behaviour that I would like to comment to see if someone can suggest me a solution. I'm testing the CPU transcription duration of several audios on different machines using the BatchedInferencePipeline function and when using the c7g.2xlarge instance of AWS the performance gets worse, unlike on my local machine (in this case the results only get better when the audio is long enought). Below I indicate the processor used on each machine:

My machine: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
AWS machine: AWS Graviton3

In the following table I compile the transcription time in seconds of some audios I have tested:

Machine	AUDIO 1 (0:56)	AUDIO 2 (3:37)	AUDIO 3 (5:14)
My machine	172 s	431 s	874 s
AWS	168 s	196 s	328 s
My machine (Batch=512)	219 s	396 s	443 s
AWS (Batch=512)	319 s	462 s	690 s

I have tried different batch sizes and the results does not improve on the AWS machine.

The times obtained on AWS are still better than on my machine but I still wonder if they could be improved. Does anyone know how I can improve transcription times in AWS thanks to BatchInferencePipeline? Are AWS machines fully optimized?

Thanks!

MahmoudAshraf97 · 2024-10-25T13:36:40Z

MahmoudAshraf97
Oct 25, 2024
Maintainer

Batching is mainly beneficial for GPUs, and also regardless of the device, batching is beneficial until you are bounded by compute bottleneck, if you increase the batch size further you will not gain anything and might lose performance in some cases

4 replies

formater Nov 21, 2024

Does batching improves performance? I'm using 24 models loaded into 4xRTX4090, and I hit a limit where all 4 GPUs are occupied in 100%. Would implementing batch processing help me get more performance?

MahmoudAshraf97 Nov 21, 2024
Maintainer

Ofcourse, I'm updating the benchmarks in #1161 , batched inference gives around 4x speedup and there's room for more depending on your hardware

formater Nov 21, 2024

Thank you! One last question on BatchedInferencePipeline. From the readme it seems you are passing only 1 audio, with batch_size=8.. This means, it will segment that audio (probably using VAD) into smaller 8 smaller audios?

Is it possible to submit 8 audios with batch_size=8 (meaning, I do the audio segmentation, and not the internal VAD)?

MahmoudAshraf97 Nov 21, 2024
Maintainer

It's possible, but not through BatchedInferencePipeline, you have to segment the audio into chunks, extract features, and then use CT2 model directly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched inference doesn´t improve time on AWS machines #1089

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Batched inference doesn´t improve time on AWS machines #1089

pablex1912 Oct 25, 2024

Replies: 1 comment · 4 replies

MahmoudAshraf97 Oct 25, 2024 Maintainer

formater Nov 21, 2024

MahmoudAshraf97 Nov 21, 2024 Maintainer

formater Nov 21, 2024

MahmoudAshraf97 Nov 21, 2024 Maintainer

pablex1912
Oct 25, 2024

Replies: 1 comment 4 replies

MahmoudAshraf97
Oct 25, 2024
Maintainer

MahmoudAshraf97 Nov 21, 2024
Maintainer

MahmoudAshraf97 Nov 21, 2024
Maintainer