Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset #174

Open
snmahsa opened this issue Apr 16, 2024 · 2 comments
Open

dataset #174

snmahsa opened this issue Apr 16, 2024 · 2 comments

Comments

@snmahsa
Copy link

snmahsa commented Apr 16, 2024

Hello. I want to train this model on a new language. I want to know what structure the dataset should have for this model.

@RafaelJCruz
Copy link

I‘m also questioning... maybe just .wav file is ok, but haven't confirmed up to now

@realamirhe
Copy link

realamirhe commented Sep 8, 2024

@RafaelJCruz
whisper (esp. the default medium) is not that perfect, for transcription.

@snmahsa the dataset standards like LJSpeech might be sufficient.

In case you like me were searching for proper dataset size they used for their non-english languages (e.g. Japanese)?
#96 (comment)

Data used for training English Chinese Japanese
Microsoft's LibriLight (70k+ hours) Wenet Speech (10k+ hours) -
Ours (reproduced) LibriTTS + self-gathered (704 hours) Aishell 1, 3, Aidatatang + self-gathered (598 hours) JP commonvoice + self-gathered (437 hours)
table reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants