You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for bringing this up.
According to the model page they deployed its own dependent package for it and can be used by:
from espnet2.bin.asr_inference import Speech2Text
model = Speech2Text.from_pretrained(
"espnet/owsm_v3"
)
speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)[0]
So it might not be difficult to integrate with this web ui, but I'd like to run benchmarks / do some tests when I have a time, to find out what's better in terms of speed / WER compared to other implementations.
Any opinions/experience on this model would be welcome.
In my tests Distil-Whisper models are inferior/not something to use comparing to Open Whisper models largev2/v3. Maybe OWSM models could be better? Could they be added? Or how to add them manually to test...
https://arxiv.org/abs/2401.16658
https://www.wavlab.org/activities/2024/owsm/
https://huggingface.co/espnet
The text was updated successfully, but these errors were encountered: