This is an effort to collect speech recognition corpus for Catalan language from Podcast that can be used for training speech recognition models.
These files are transcribed, and then reviewed by humans.
This the Quinze glaçons d'hidrogen Softcatalà podcast.
Characteristics of the Podcast:
- Several voices (3 to 6) with different Catalan accents per episode
- Content it is about new technologies and Catalan languages
Human reviewers: Aleix Vidal, Xavier Dengra, Carles Canellas, Joan Claverol, Assumpta Anglada, Laura Humet.
License: CC-BY-SA as the original Podcast
This the EMpodcat podcast.
Characteristics of the Podcast:
- Several voices (2 to 4) mostly male voice and central catalan accent
- Content it is about medical emergency
Human reviewers: Albert Homs
License: CC-BY-SA as the original Podcast