Skip to content

Commit

Permalink
Automated deployment: Tue Sep 3 14:44:32 UTC 2024 7ef894a
Browse files Browse the repository at this point in the history
  • Loading branch information
JiJiJiang committed Sep 3, 2024
1 parent 7acdd99 commit 89f1574
Show file tree
Hide file tree
Showing 58 changed files with 821 additions and 15 deletions.
11 changes: 11 additions & 0 deletions _modules/wespeaker/cli/utils.html
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,17 @@ <h1>Source code for wespeaker.cli.utils</h1><div class="highlight"><pre>
<span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s1">&#39;whether to use the damo/speech_eres2net_sv_zh-cn_16k-common model&#39;</span>
<span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span>
<span class="s1">&#39;--vblinkp&#39;</span><span class="p">,</span>
<span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s1">&#39;whether to use the samresnet34 model pretrained on voxblink2&#39;</span>
<span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span>
<span class="s1">&#39;--vblinkf&#39;</span><span class="p">,</span>
<span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">&quot;whether to use the samresnet34 model pretrained on voxblink2 and&quot;</span>
<span class="s2">&quot;fintuned on voxceleb2&quot;</span>
<span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;-p&#39;</span><span class="p">,</span>
<span class="s1">&#39;--pretrain&#39;</span><span class="p">,</span>
<span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span>
Expand Down
213 changes: 213 additions & 0 deletions _sources/papers_using_wespeaker.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Papers Implemented in WeSpeaker

[TOC]

## Stay Tuned! (Need to add a introduction for each paper)

## Introduction

After the release of the WeSpeaker project, many users from both academia and industry have actively engaged with it in their research. We appreciate all the feedback and contributions from the community and would like to highlight these interesting works.

Besides the citation of WeSpeaker itself, we highly recommend you to read and cite the corresponding papers as listed below.

```bibtex
@article{wang2024advancing,
title={Advancing speaker embedding learning: Wespeaker toolkit for research and production},
author={Wang, Shuai and Chen, Zhengyang and Han, Bing and Wang, Hongji and Liang, Chengdong and Zhang, Binbin and Xiang, Xu and Ding, Wen and Rohdin, Johan and Silnova, Anna and others},
journal={Speech Communication},
volume={162},
pages={103104},
year={2024},
publisher={Elsevier}
}

@inproceedings{wang2023wespeaker,
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
```

## Architecture

### TDNN

```bibtex
@inproceedings{snyder2018x,
title={X-vectors: Robust dnn embeddings for speaker recognition},
author={Snyder, David and Garcia-Romero, Daniel and Sell, Gregory and Povey, Daniel and Khudanpur, Sanjeev},
booktitle={2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
pages={5329--5333},
year={2018},
organization={IEEE}
}
```

### ECAPA-TDNN

```bibtex
@article{desplanques2020ecapa,
title={Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification},
author={Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris},
journal={arXiv preprint arXiv:2005.07143},
year={2020}
}
```

### ResNet

The Current ResNet implementation is based on our system for VoxSRC2019, it's also the default speaker model in Pyannote.audio diarization pipeline (https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM)

```bibtex
@article{zeinali2019but,
title={But system description to voxceleb speaker recognition challenge 2019},
author={Zeinali, Hossein and Wang, Shuai and Silnova, Anna and Mat{\v{e}}jka, Pavel and Plchot, Old{\v{r}}ich},
journal={arXiv preprint arXiv:1910.12592},
year={2019}
}
```

### ReDimNet

>

```bibtex
@article{yakovlev2024reshape,
title={Reshape Dimensions Network for Speaker Recognition},
author={Yakovlev, Ivan and Makarov, Rostislav and Balykin, Andrei and Malov, Pavel and Okhotnikov, Anton and Torgashov, Nikita},
journal={arXiv preprint arXiv:2407.18223},
year={2024}
}
```

### Golden gemini DF-ResNet

```bibtex
@article{liu2024golden,
title={Golden gemini is all you need: Finding the sweet spots for speaker verification},
author={Liu, Tianchi and Lee, Kong Aik and Wang, Qiongqiong and Li, Haizhou},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year={2024},
publisher={IEEE}
}
```

### SimAM-ResNet

```bibtex
@inproceedings{qin2022simple,
title={Simple attention module based speaker verification with iterative noisy label detection},
author={Qin, Xiaoyi and Li, Na and Weng, Chao and Su, Dan and Li, Ming},
booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={6722--6726},
year={2022},
organization={IEEE}
}
```

### Whisper based Speaker Verification

```bibtex
@article{zhao2024whisperpmfapartialmultiscalefeature,
title={Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models},
author={Yiyang Zhao and Shuai Wang and Guangzhi Sun and Zehua Chen and Chao Zhang and Mingxing Xu and Thomas Fang Zheng},
year={2024},
eprint={2408.15585},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2408.15585},
}
```

### CAM++

```bibtex
@article{wang2023cam++,
title={Cam++: A fast and efficient network for speaker verification using context-aware masking},
author={Wang, Hui and Zheng, Siqi and Chen, Yafeng and Cheng, Luyao and Chen, Qian},
journal={arXiv preprint arXiv:2303.00332},
year={2023}
}
```

### ERes2Net

```bibtex
@article{chen2023enhanced,
title={An enhanced res2net with local and global feature fusion for speaker verification},
author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and Chen, Qian and Qi, Jiajun},
journal={arXiv preprint arXiv:2305.12838},
year={2023}
}
```

## Pipelines

### DINO Pretraining with Large-scale Data

```bibtex
@inproceedings{wang2024leveraging,
title={Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition},
author={Wang, Shuai and Bai, Qibing and Liu, Qi and Yu, Jianwei and Chen, Zhengyang and Han, Bing and Qian, Yanmin and Li, Haizhou},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={10901--10905},
year={2024},
organization={IEEE}
}
```

## Dataset

### VoxBlink

```bibtex
@inproceedings{lin2024voxblink,
title={Voxblink: A large scale speaker verification dataset on camera},
author={Lin, Yuke and Qin, Xiaoyi and Zhao, Guoqing and Cheng, Ming and Jiang, Ning and Wu, Haiying and Li, Ming},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={10271--10275},
year={2024},
organization={IEEE}
}

@article{lin2024voxblink2,
title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark},
author={Lin, Yuke and Cheng, Ming and Zhang, Fulin and Gao, Yingying and Zhang, Shilei and Li, Ming},
journal={arXiv preprint arXiv:2407.11510},
year={2024}
}
```

### VoxCeleb

```bibtex
@article{nagrani2017voxceleb,
title={Voxceleb: a large-scale speaker identification dataset},
author={Nagrani, Arsha and Chung, Joon Son and Zisserman, Andrew},
journal={arXiv preprint arXiv:1706.08612},
year={2017}
}

@article{chung2018voxceleb2,
title={Voxceleb2: Deep speaker recognition},
author={Chung, Joon Son and Nagrani, Arsha and Zisserman, Andrew},
journal={arXiv preprint arXiv:1806.05622},
year={2018}
}
```

### CNCeleb

```bibtex
@inproceedings{fan2020cn,
title={Cn-celeb: a challenging chinese speaker recognition dataset},
author={Fan, Yue and Kang, JW and Li, LT and Li, KC and Chen, HL and Cheng, ST and Zhang, PY and Zhou, ZY and Cai, YQ and Wang, Dong},
booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7604--7608},
year={2020},
organization={IEEE}
}
```
5 changes: 4 additions & 1 deletion _sources/pretrained.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,10 @@ The model with suffix **LM** means that it is further fine-tuned using large-mar
| [VoxCeleb](../examples/voxceleb/v2/README.md) | EN | [ECAPA1024](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_ECAPA1024.zip) / [ECAPA1024_LM](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_ECAPA1024_LM.zip) | [ECAPA1024](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_ECAPA1024.onnx) / [ECAPA1024_LM](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_ECAPA1024_LM.onnx) |
| [VoxCeleb](../examples/voxceleb/v2/README.md) | EN | [Gemini_DFResnet114_LM](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_gemini_dfresnet114_LM.zip)| [Gemini_DFResnet114_LM](https://wenet.org.cn/downloads?models=wespeaker&version=voxceleb_gemini_dfresnet114_LM.onnx) |
| [CNCeleb](../examples/cnceleb/v2/README.md) | CN | [ResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=cnceleb_resnet34.zip) / [ResNet34_LM](https://wenet.org.cn/downloads?models=wespeaker&version=cnceleb_resnet34_LM.zip) | [ResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=cnceleb_resnet34.onnx) / [ResNet34_LM](https://wenet.org.cn/downloads?models=wespeaker&version=cnceleb_resnet34_LM.onnx) |

| [VoxBlink2](../examples/voxceleb/v2/README.md) | Multilingual | [SimAMResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet34.zip) | [SimAMResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet34.onnx) |
| [VoxBlink2 (pretrain) + VoxCeleb2 (finetune)](../examples/voxceleb/v2/README.md) | Multilingual | [SimAMResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet34_ft.zip) |[SimAMResNet34](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet34_ft.onnx) |
| [VoxBlink2](../examples/voxceleb/v2/README.md) | Multilingual | [SimAMResNet100](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet100.zip) |[SimAMResNet100](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet100.onnx)|
| [VoxBlink2 (pretrain) + VoxCeleb2 (finetune)](../examples/voxceleb/v2/README.md) | Multilingual | [SimAMResNet100](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet100_ft.zip) |[SimAMResNet100](https://wenet.org.cn/downloads?models=wespeaker&version=voxblink2_samresnet100_ft.onnx) |
### huggingface

| Datasets | Languages | Checkpoint (pt) | Runtime Model (onnx) |
Expand Down
1 change: 1 addition & 0 deletions _sources/python_api/wespeaker.models.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Submodules
wespeaker.models.repvgg
wespeaker.models.res2net
wespeaker.models.resnet
wespeaker.models.samresnet
wespeaker.models.speaker_model
wespeaker.models.tdnn
wespeaker.models.whisper_PMFA
7 changes: 7 additions & 0 deletions _sources/python_api/wespeaker.models.samresnet.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
wespeaker.models.samresnet module
=================================

.. automodule:: wespeaker.models.samresnet
:members:
:undoc-members:
:show-inheritance:
2 changes: 2 additions & 0 deletions _sources/python_package.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ You can specify the following parameters. (use `-h` for details)
use [`campplus_cn_common_200k` of damo](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary)
* `--eres2net`:
use [`res2net_cn_common_200k` of damo](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary)
* `--vblinkp`: use the sam_resnet34 model pretrained on VoxBlink2
* `--vblinkf`: use the sam_resnet34 model pretrained on VoxBlink2 and finetuned on VoxCeleb2
* `--audio_file`: input audio file path
* `--audio_file2`: input audio file2 path, specifically for the similarity task
* `--wav_scp`: input wav.scp file in kaldi format (each line: key wav_path)
Expand Down
1 change: 1 addition & 0 deletions _sources/reference.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ Reference

./paper.md
./speaker_recognition_papers.md
./papers_using_wespeaker.md
./python_api/modules.rst
1 change: 1 addition & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ <h1>Welcome to Wespeaker’s documentation!<a class="headerlink" href="#welcome-
<li class="toctree-l1"><a class="reference internal" href="reference.html">Reference</a><ul>
<li class="toctree-l2"><a class="reference internal" href="paper.html">Wespeaker Papers</a></li>
<li class="toctree-l2"><a class="reference internal" href="speaker_recognition_papers.html">Speaker Recognition Papers</a></li>
<li class="toctree-l2"><a class="reference internal" href="papers_using_wespeaker.html">Papers Implemented in WeSpeaker</a></li>
<li class="toctree-l2"><a class="reference internal" href="python_api/modules.html">Python API Reference</a></li>
</ul>
</li>
Expand Down
Binary file modified objects.inv
Binary file not shown.
1 change: 1 addition & 0 deletions paper.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
<li class="toctree-l1 current"><a class="reference internal" href="reference.html">Reference</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Wespeaker Papers</a></li>
<li class="toctree-l2"><a class="reference internal" href="speaker_recognition_papers.html">Speaker Recognition Papers</a></li>
<li class="toctree-l2"><a class="reference internal" href="papers_using_wespeaker.html">Papers Implemented in WeSpeaker</a></li>
<li class="toctree-l2"><a class="reference internal" href="python_api/modules.html">Python API Reference</a></li>
</ul>
</li>
Expand Down
Loading

0 comments on commit 89f1574

Please sign in to comment.