Skip to content

Softcatala/softcatala-podcast-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 

Repository files navigation

softcatala-podcast-corpus

This is an effort to collect speech recognition corpus for Catalan language from Podcast that can be used for training speech recognition models.

These files are transcribed, and then reviewed by humans.

Softcatalà Podcast

This the Quinze glaçons d'hidrogen Softcatalà podcast.

Characteristics of the Podcast:

  • Several voices (3 to 6) with different Catalan accents per episode
  • Content it is about new technologies and Catalan languages

Human reviewers: Aleix Vidal, Xavier Dengra, Carles Canellas, Joan Claverol, Assumpta Anglada, Laura Humet.

License: CC-BY-SA as the original Podcast

EMpodcat Podcast

This the EMpodcat podcast.

Characteristics of the Podcast:

  • Several voices (2 to 4) mostly male voice and central catalan accent
  • Content it is about medical emergency

Human reviewers: Albert Homs

License: CC-BY-SA as the original Podcast

About

Softcatalà Podcasts speech recognition corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published