Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in large-v1, v2 v3 models? #338

Open
ItsNoted opened this issue Oct 16, 2024 · 1 comment
Open

Differences in large-v1, v2 v3 models? #338

ItsNoted opened this issue Oct 16, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@ItsNoted
Copy link

What are the main differences in large-v1, v2 and v3 models? They all seem to be nearly the same exact size so I am curious how I can see what the differences are?

@ItsNoted ItsNoted added the bug Something isn't working label Oct 16, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Oct 16, 2024

Hi. Simply trained differently.

Details are here:

They give you different results for the same input.

I haven't run any GOOD benchmarks in terms of WER (Word Error Rate, Lower is better.) between like large-v1 vs large-v2 etc, so here's my personal experience:

large-v1 is the first large model and in most cases it gives a worse result than large-v2.

A bit controversial is large-v2 vs large-v3.
In my personal experience, large-v3 often causes really bad hallucinations if the audio has a little bit of noise.
( See #152 (comment) and openai/whisper#2378 for more info )

But if the audio is really clean without much noise, like an ASR benchmark dataset, large-v3 will give you more accurate timestamps in my experience.

And if you don't care much about accurate result, you can consider using large-v3-turbo because it's lighter, faster, with really minor result quality downgrade than large-v3.

You can see how to use it in the Web UI at #309 (comment).

TL DR;
If your audio is clean (no noise), use large-v3.
If not, use large-v2.
If you're OK with a slight loss of quality compared to large-v3, use large-v3-turbo for faster transcription.

@jhj0517 jhj0517 added question Further information is requested and removed bug Something isn't working labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants