Skip to content

MuPe-Diversidades is a diverse selection of samples of audios in Brazilian Portuguese, extracted from CORAA MuPe, and their respective transcriptions with prosodic segmentation annotation. It includes samples of speech of approximately 10 minutes for each state, with speakers who are diverse in gender and age.

License

Notifications You must be signed in to change notification settings

nilc-nlp/MuPe-Diversidades

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MuPe-Diversidades

MuPe-Diversidades is a diverse selection of samples of audios in Brazilian Portuguese, extracted from CORAA MuPe, and their respective transcriptions with prosodic segmentation annotation. It includes samples of speech of approximately 10 minutes for each state, around 5 minutes for each speaker from a singular state, who are diverse in gender and age.

Version-0

Already available

Includes anonymized audios and transcriptions with automatic prosodic segmentation annotation

Note: audios were cut every 30 seconds automatically at pre-processing steps and put back together, which was found to jeopardize the quality of prosodic segmentation annotation, so version 1.0 will soon be available with the correction of the problem

Version 1.0

In progress

Audios without cuts

With manual review of the prosodic segmentation annotation

Sponsors / Funding

This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and by the IBM Corporation. This project was also supported by the Ministry of Science, Technology and Innovation, with resources of Law No. 8.248, of October 23, 1991, within the scope of PPI-SOFTEX, coordinated by Softex and published Residence in TIC 13, DOU 01245.010222/2022-44.

CORAA MUPE

Available soon at: (link will be inserted here)

About

MuPe-Diversidades is a diverse selection of samples of audios in Brazilian Portuguese, extracted from CORAA MuPe, and their respective transcriptions with prosodic segmentation annotation. It includes samples of speech of approximately 10 minutes for each state, with speakers who are diverse in gender and age.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published