We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ "audio": { "path": item["audio_file"], } "sentence": item["label"][0].replace(" ", "") "language": "Chinese", "duration": 7.37 } duration字段需要根据什么信息获取?
The text was updated successfully, but these errors were encountered:
不確定你的問題是 如何獲取duration 還是 為何要有duration 兩個問題我都一起回答好了 儘管這對各位大佬來說可能是廢話XD
如何獲取duration
為何要有duration
有很多工具可以表列出音頻的長度(例如 librosa, ffmpeg) 這邊我提供一個用python librosa module提取duration的範例
import librosa librosa.get_duration(path='dataset/audio0.wav')
需要有duration欄位 是為了移除過長、過短的音頻,這些音頻可能會導致訓練效果變差 可以參考源代碼的這個部分 https://github.com/yeyupiaoling/Whisper-Finetune/blob/dd3653a3103fb53323ff95a6ebe875bed3c7a47d/utils/reader.py#L89C23-L89C25
Sorry, something went wrong.
感谢大佬很耐心的解答 感谢🙏
请问大佬 duration在这里只是为了移除过长/过短的音频的话 那么如果我有一个很大的语音/文本对应的数据集 但是统计每一条语音的长度花费时间太长 是不是可以直接给每个duration字段赋一个安全的值(例如readme里面那个样例的7.37)而不需要让每一个duration都真的对应这条音频的时长?
No branches or pull requests
The text was updated successfully, but these errors were encountered: