Add Open Whisper-style Speech Models (OWSM) #391

montvid · 2024-11-12T15:04:29Z

In my tests Distil-Whisper models are inferior/not something to use comparing to Open Whisper models largev2/v3. Maybe OWSM models could be better? Could they be added? Or how to add them manually to test...
https://arxiv.org/abs/2401.16658
https://www.wavlab.org/activities/2024/owsm/
https://huggingface.co/espnet

jhj0517 · 2024-11-12T15:20:10Z

Hi, thanks for bringing this up.
According to the model page they deployed its own dependent package for it and can be used by:

from espnet2.bin.asr_inference import Speech2Text

model = Speech2Text.from_pretrained(
  "espnet/owsm_v3"
)

speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)[0]

So it might not be difficult to integrate with this web ui, but I'd like to run benchmarks / do some tests when I have a time, to find out what's better in terms of speed / WER compared to other implementations.

Any opinions/experience on this model would be welcome.

montvid added the enhancement New feature or request label Nov 12, 2024

montvid assigned jhj0517 Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Open Whisper-style Speech Models (OWSM) #391

Add Open Whisper-style Speech Models (OWSM) #391

montvid commented Nov 12, 2024

jhj0517 commented Nov 12, 2024 •

edited

Loading

Add Open Whisper-style Speech Models (OWSM) #391

Add Open Whisper-style Speech Models (OWSM) #391

Comments

montvid commented Nov 12, 2024

jhj0517 commented Nov 12, 2024 • edited Loading

jhj0517 commented Nov 12, 2024 •

edited

Loading