dataset #174

snmahsa · 2024-04-16T12:13:08Z

Hello. I want to train this model on a new language. I want to know what structure the dataset should have for this model.

RafaelJCruz · 2024-04-17T09:31:54Z

I‘m also questioning... maybe just .wav file is ok, but haven't confirmed up to now

realamirhe · 2024-09-08T09:25:55Z

@RafaelJCruz
whisper (esp. the default medium) is not that perfect, for transcription.

@snmahsa the dataset standards like LJSpeech might be sufficient.

In case you like me were searching for proper dataset size they used for their non-english languages (e.g. Japanese)?
#96 (comment)

Data used for training	English	Chinese	Japanese
Microsoft's	LibriLight (70k+ hours)	Wenet Speech (10k+ hours)	-
Ours (reproduced)	LibriTTS + self-gathered (704 hours)	Aishell 1, 3, Aidatatang + self-gathered (598 hours)	JP commonvoice + self-gathered (437 hours)

Provide feedback