feat: Replace torchaudio with pydub #381

h3110Fr13nd · 2024-08-24T14:25:10Z

This PR closes #370
This PR closes #380

feat: Replace torchaudio with pydub
refactor: Removed unnecessary dependencies

Removed Requirements

python-dateutil
tiktoken
torchaudio
scipy
tokenizers
huggingface-hub
sentence-transformers
optimum[onnxruntime]

Major Changes in This Commit

torchaudio to pydub
- bolna/helpers/utils.py
  - save_audio_file_to_s3
  - resample
  - pcm_to_wav_bytes
  - wav_bytes_to_pcm
- bolna/synthesizer/basesynthesizer
  - resample
sklearn to np
- bolna/memory/cache/vector_cache
  - __get_top_cosine_similarity_doc

Issue #370 closes as we are replacing both scipy and torchaudio with pydub

refactor: Removed unnecessary dependencies Removed Requirements - python-dateutil - tiktoken - torchaudio - scipy - tokenizers - huggingface-hub - sentence-transformers - optimum[onnxruntime] Major Changes in This Commit - torchaudio to pydub - bolna/helpers/utils.py - save_audio_file_to_s3 - resample - pcm_to_wav_bytes - wav_bytes_to_pcm - bolna/synthesizer/basesynthesizer - resample - sklearn to np - bolna/memory/cache/vector_cache - __get_top_cosine_similarity_doc

h3110Fr13nd

The tricky part for me was

def save_audio_file_to_s3
...

First I had to understand what it does.
Also had to spend hours trying to decode input message (only to find it's file format as webm and not wav).
Testing for this function was very hard as I'm not able to run the "default" handler as I lack some client code. Tried it with daily and by passing custom data to function.

h3110Fr13nd

I've tried and tested it.
If you want to Test it before Merging the PR
Then you'll have to make a fork and make a change in bolna_server.Dockerfile

FROM python:3.10.13-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgomp1 \
    git \
    ffmpeg
# NOTE: Change the username and repo name to your fork
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install git+https://github.com/<your-username>/<your-fork-bolna>@master

COPY quickstart_server.py /app/
EXPOSE 5001
CMD ["uvicorn", "quickstart_server:app", "--host", "0.0.0.0", "--port", "5001"]

This step is required to rebuild image.

h3110Fr13nd · 2024-08-24T15:18:57Z

After building Image
New Reduced Size of Docker Image is 1.21GB quite an improvement from previous 6GB.

h3110Fr13nd · 2024-08-25T20:49:05Z

As mentioned in #380 by @marmikcfc
(https://github.com/bolna-ai/bolna/issues/380#issuecomment-2308490495)

Hey @h3110Fr13nd, can you ensure the changing pydub doesn't affect the latency in calls? Because I do remember pydub degrading quality of calls a bit because of time consuming operation.

Updated resampling with soxr, ensuring no increased latency, good quality, reduced docker image size (1.22 GB).

Of Course, Your Concern were correct.

Although i didn't really find any difference in taking calls, But writing testcases to compare Old and new functions, showed almost similar execution times or better, with an exception of resample function making it worse by more than couple of times.

I tried various libraries for resampling Like torchaudio, soxr, pydub, scipy, numpy, soundfile, librosa etc. Of course plain numpy maybe faster in some cases, but it is linear interpolation, So not good quality output of resampled audio. soxr was best in resampling faster and returned a high quality output and is a lightweight library.
python test.py

Old pcm_to_wav_bytes function took 1.1920928955078125e-06 seconds
New pcm_to_wav_bytes function took 4.76837158203125e-07 seconds
.

Resampling from 24000 to 8000
Torchaudio resample function took 0.00834965705871582 seconds
pydub audiosegment resample function took 0.0986635684967041 seconds
Soxr resample function took 0.00360107421875 seconds
Numpy resample function took 0.004979848861694336 seconds
Scipy resample function took 0.0089263916015625 seconds
.

Old wav_bytes_to_pcm function took 5.340576171875e-05 seconds
New wav_bytes_to_pcm function took 4.2438507080078125e-05 seconds
.

----------------------------------------------------------------------
Ran 4 tests in 0.417s

OK
I've commited the changes to use soxr for resampling. Confirming no latency by replacing torchaudio
(https://github.com/bolna-ai/bolna/issues/380#issuecomment-2308986072)

requirements.txt

h3110Fr13nd commented Aug 24, 2024

View reviewed changes

h3110Fr13nd marked this pull request as ready for review August 24, 2024 14:36

Refactor audio resampling logic using soxr library

73002ce

h3110Fr13nd commented Sep 1, 2024

View reviewed changes

requirements.txt Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Replace torchaudio with pydub #381

feat: Replace torchaudio with pydub #381

h3110Fr13nd commented Aug 24, 2024 •

edited

Loading

h3110Fr13nd left a comment •

edited

Loading

h3110Fr13nd left a comment •

edited

Loading

h3110Fr13nd commented Aug 24, 2024

h3110Fr13nd commented Aug 25, 2024

feat: Replace torchaudio with pydub #381

Are you sure you want to change the base?

feat: Replace torchaudio with pydub #381

Conversation

h3110Fr13nd commented Aug 24, 2024 • edited Loading

Removed Requirements

Major Changes in This Commit

h3110Fr13nd left a comment • edited Loading

Choose a reason for hiding this comment

h3110Fr13nd left a comment • edited Loading

Choose a reason for hiding this comment

h3110Fr13nd commented Aug 24, 2024

h3110Fr13nd commented Aug 25, 2024

h3110Fr13nd commented Aug 24, 2024 •

edited

Loading

h3110Fr13nd left a comment •

edited

Loading

h3110Fr13nd left a comment •

edited

Loading