-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional ffmpeg #72
Comments
I currently pass all audio files to FFMpeg for several reasons:
There are complex pipelines of operations here. Currently the processing is done mostly in-memory. Often there are actually multiple copies of the same waveform in memory in different sample rates or processing stages. I have a wave encoder/decoder I wrote, which is used extensively internally. In the latest version (unreleased), I made it more portable so it could be run on more runtimes like Web / Deno / Bun etc. If I'd use it for wave files it could save a little bit of time compared to using ffmpeg, though not that much relative to the duration of the whole recognition operation. I'll also need to enhance it with more streaming operations and format conversions, etc. so it would be usable enough for it. In the future I'll work on reducing memory use by using a streaming approach in more places, but that would only be truly significant, in practice, for very large waveforms, like 3 hours or more. That would be an overall project for itself, not just limited to only |
I didn't realize all that was going in to it! Makes sense to have it loaded in memory for all of that. Unfortunately on my raspberry pi 4 with 8gb RAM, a 1.3 gb (11hr) WAV file crashes the entire system on the ffmpeg step, so I've been breaking them down into pieces and running echogarden on a loop for all the pieces. Of course the ffmpeg time is nothing compared to the transcription time, but it was just a thought. The streaming approach would be nice to solve this. |
Getting everything streaming, either from disk (less likely), or say, a single in-memory copy of the audio in a compact form like 16-bit interleaved, would be great, but not very convenient to work with and pretty challenging to actually implement. It's not going to make things faster, most likely would be a bit slower, but the memory requirement would be reduced significantly for very long audio (for short audio there wouldn't be much different). There is some complexity in sample-rate conversion, for example, if I have an input that is 48000 Hz but I need 16000 Hz for processing (common for recognition), a streaming approach could just lookup a particular block of the audio and convert it as needed. Problem is that sample conversion usually needs a few more samples beyond that (usually before) so it will need to extract some extra samples beyond the block, which adds quite a bit of complexity. It would require a lot of effort. The upcoming version actually does some steps towards that. Both for reducing memory requirements, and improving portability of the code to the Web / Deno / Bun (Internal usage of Node I also removed reliance on Node.js streams almost completely, so I have custom methods to read and write from disk incrementally, which could help here. There is a lot of other work that needs to be done, it's an ongoing project. Interesting you're actually running it on Raspberry PI which uses Linux for ARM - not a platform I've ever tested on or officially supported, but there's no reason why not really. I wrote a brand new audio I/O addon (precompiled C++ addon via N-API) as well, which is intended to completely remove the dependence on |
For transcription, I prefer to convert my source audio to whisper compatible .wav first and then run echogarden. However, when transcribing with echogarden, ffmpeg is called regardless of the source audio format (i.e. when the source audio is already in a compatible format)
Could you implement a check that looks first to see if source audio is in compatible format before performing a transcode?
That or include a flag that can disable ffmpeg altogether.
The text was updated successfully, but these errors were encountered: