Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Open Whisper-style Speech Models (OWSM) #391

Open
montvid opened this issue Nov 12, 2024 · 1 comment
Open

Add Open Whisper-style Speech Models (OWSM) #391

montvid opened this issue Nov 12, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@montvid
Copy link

montvid commented Nov 12, 2024

In my tests Distil-Whisper models are inferior/not something to use comparing to Open Whisper models largev2/v3. Maybe OWSM models could be better? Could they be added? Or how to add them manually to test...
https://arxiv.org/abs/2401.16658
https://www.wavlab.org/activities/2024/owsm/
https://huggingface.co/espnet

@montvid montvid added the enhancement New feature or request label Nov 12, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Nov 12, 2024

Hi, thanks for bringing this up.
According to the model page they deployed its own dependent package for it and can be used by:

from espnet2.bin.asr_inference import Speech2Text

model = Speech2Text.from_pretrained(
  "espnet/owsm_v3"
)

speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)[0]

So it might not be difficult to integrate with this web ui, but I'd like to run benchmarks / do some tests when I have a time, to find out what's better in terms of speed / WER compared to other implementations.

Any opinions/experience on this model would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants