Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Source code for the paper "Self-Supervised Speech Denoising Using Only Noisy Audio Signals". In this paper, we propose a self-supervised speech denoising strategy named Only-Noisy Training (ONT), which solves the speech denoising problem with only noisy audio signals in audio space for the first time.

Related audio examples of signals $x$, $s_{1}(x)$, $s_{2}(x)$, $f(x)$, $f(s_{1}(x))$, $f(s_{2}(x))$ can be downloaded from the Audio_example file.

Environment

The code is developed under the following configurations.

Hardware: 1-4 GPUs (change gpu numbers accordingly)
Software: CUDA=11.4.48, Python=3.9.5, PyTorch=1.10.0

Dataset

The dataset is generated by overlapping two noise categories over clean audios. The clean audios are selected from Voice Bank + DEMAND dataset (https://datashare.ed.ac.uk/handle/10283/2791). While the first noise dataset uses white gaussian to generate synthetic noisy audio dataset, the second is UrbanSound8K dataset (https://urbansounddataset.weebly.com/urbansound8k.html), which is used to generate real world noisy audio dataset. You can generate different kinds of noisy audios by running the following two scripts:

python whitenoise_dataset_generator.py
python urbansound_dataset_generator.py

Model

The proposed ONT constructs training pairs only from each single noisy audio, and it contains two modules: training audio pairs generated module and speech de-noising module.

The first module adopts a random audio sub-sampler on single noisy audio samples for the generation of training audio pairs.
The sub-sampled training audio pairs are then fed into the audio denoising module, which employs a deep complex U-Net incorporating a complex two-stage transformer (cTSTM) to extract both magnitude and phase information.

Training

You can choose the following three models for training: ONT,ONT-rTSTM and ONT-cTSTM. ONT represents our speech denoising strategy using single noisy audio samples with a complex U-Net without TSTM; ONT-rTSTM represents our novel strategy with a real-valued TSTM (rTSTM) between the complex U-Net; ONT-cTSTM represents our novel strategy with a complex-valued TSTM (cTSTM) between the complex U-Net. You can train the model by running the following script:

python train.py

The original raw audio waveforms are first sampled at 48kHz and then transformed by the STFT with a 64ms Hamming window and 16ms hop size to obtain complex-valued spectrograms. The length of training pairs is 1/2 times of original noisy samples. The number of two-stage transformer blocks(TSTBs) is 6. In the loss function, we set α = 0.8, β = 1/200, γ=2 in the synthetic experiments and γ=1 in the real world experiments empirically. We use Adam optimizer with a learning rate of 0.001. You can adjust these parameters to suit different training needs.

Special thanks to the following repositories:

Reference

If it helps your research, please cite:

Jiasong Wu, Qingchun Li, Guanyu Yang, Lei Li, Lotfi Senhadji and Huazhong Shu, Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Contacts

Qingchun Li: [email protected]
Jiasong Wu: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Audio_example		Audio_example
DCUnet10_TSTM		DCUnet10_TSTM
ONT.png		ONT.png
README.md		README.md
colored_noise_utils.py		colored_noise_utils.py
dataset_utils.py		dataset_utils.py
loss.py		loss.py
loss_utils.py		loss_utils.py
metrics.py		metrics.py
metrics_utils.py		metrics_utils.py
models.png		models.png
preprocess.py		preprocess.py
train.py		train.py
urbansound_dataset_generator.py		urbansound_dataset_generator.py
whitenoise_dataset_generator.py		whitenoise_dataset_generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Environment

Dataset

Model

Training

Special thanks to the following repositories:

Reference

Contacts

About

Releases

Packages

Languages

liqingchunnnn/Only-Noisy-Training

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Environment

Dataset

Model

Training

Special thanks to the following repositories:

Reference

Contacts

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages