Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 7th】Fundable Projects No.7 #3870

Open
Liyulingyue opened this issue Nov 3, 2024 · 4 comments
Open

【Hackathon 7th】Fundable Projects No.7 #3870

Liyulingyue opened this issue Nov 3, 2024 · 4 comments
Assignees

Comments

@Liyulingyue
Copy link
Contributor

Liyulingyue commented Nov 3, 2024

说明

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源套件,囊括语音识别、语音合成、语音唤醒、声纹识别等多种语音常用功能的支持。由于近期 Paddle 新版本的升级存在不兼容部分(如 paddle.fluid API 全面退场,PIR + predictor 升级, 0-d tensor,view 行为修改等),需要重新对 PaddleSpeech 中的模型进行适配开发与回归测试,保证套件正常运转。

本Issue说明关了PaddleSpeech的改动,现有教程、文档、模型的验证和支持等情况。

Docker改进

为了适配最新版本Paddlepaddle(版本3.0.0),对Docker进行升版 #3871

Demos

本节记录了demos运行验证记录,标识中,N为无故障,E为存在问题,W为存在警告,U为未运行。

测试方法

  1. 在Aistudio V100 32G 环境下,paddlepaddle-gpu版本为3.0,clone本仓库
  2. 手动删除 setup.py 中对paddlepaddle-gpu的依赖
  3. 通过 pip install . --user 安装PaddleSpeech
  4. 运行Demos中相关命令

测试结论与记录

大部分Python API调用正常,部分问题如下:

  1. 执行speech_ssl demo时,有错误 TypeError: Wav2vec2ASR.forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens'
  2. 执行style_fs2 demo时,存在0-D tensor的warning
  3. 执行whisper demo时,如果输入文件采样率不是16000,会因Paddle侧算子不支持的数据类型报错。
名称 说明 问题标识 PR
TTSAndroid 无相关环境,未运行 U
TTSArmLinux Aistudio环境不好,Cmake未成功 U
TTSCppFrontend Aistudio环境不好,Cmake未成功 U
asr_deployment 基于SpeechX,暂不验证 U
audio_content_search 未运行 U
audio_searching 未运行 U
audio_tagging Python 成功运行 N
automatic_video_subtitiles Python 成功运行 N
custom_streaming_asr 未运行 U
keyword_spotting Python 成功运行 N
metaverse 未运行,该脚本和 PaddleGAN 绑定,可能会冲突 U
punctuation_restoration Python 成功运行 N
speaker_verification Python 成功运行 N
speech_recognition Python 成功运行 N
speech_server 未运行 U
speech_ssl TypeError: Wav2vec2ASR.forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens' E #3872
speech_translation Python 运行成功 N
speech_web 未运行 U
story_talker Numpy版本导致了错误,AttributeError: module 'numpy' has no attribute 'complex'. E
streaming_asr_server 未运行 U
streaming_tts_server 未运行 U
streaming_tts_serving_fastdeploy 未运行 U
style_fs2 成功运行,存在warning /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py:2082: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is []. W
text_to_speech Python 成功运行 N
whisper 未配置好16000的wav导致没运行成功,此外转码后代码会由于Paddle算子报错 E

Examples

待补充

Models

待补充

@yinfan98
Copy link

yinfan98 commented Nov 3, 2024

Models:

Reference: https://doc.weixin.qq.com/sheet/e3_AakAbwboADEPP0wsOQWQQGQSlvz4D?scode=AHAA0Qc9AFoUxqpjLx

模型名称 数据 报名
P0 (首先确保)
conformer aishell @GreatV
whisper(仅推理) aishell
pwgan baker
fastspeech aishell3
P1 (需确保)
conformer(流式) TAL_CS
wav2vec2 aishell
hifigan csmsc
vits aishell3
Tacotron2 csmsc
P2 (可选)
panns-cnn14 esc-50
wav2vec2 librispeech
transofomer librispeech
DeepSpeech2 aishell
hubert librispeech
wavlm librispeech
ECAPA-TDNN
Style MelGAN

@yinfan98
Copy link

yinfan98 commented Nov 3, 2024

看起来转静态图的事情八成是逃不掉的,训推可能还好验证一点orz

@GreatV
Copy link
Contributor

GreatV commented Nov 4, 2024

这么多数据集都验证吗,下载起来都比较费劲。感觉可以按照 https://github.com/PaddlePaddle/PaddleSpeech#model-list 里面的模型,来划分每个人领取几个模型,讨论汇总修复过程中的共性问题。

@Liyulingyue
Copy link
Contributor Author

报名:pwgan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

5 participants