Choose a PyTorch container from NVIDIA PyTorch Container Versions and create a Dockerfile as /text2speech/docker/Dockerfile
- Build and run docker
$ docker build --no-cache -t torcht2s .
$ docker run -it --rm --gpus all -p 2222:8888 -v /your/working/directory/text-to-speech/text2speech:/your/working/directory/text-to-speech/text2speech torcht2s
- Add environment to jupyter notebook and launch jupyter notebook
$ python -m ipykernel install --user --name=torcht2s
$ jupyter notebook --ip= --port=8888 --no-browser --allow-root
- Open a browser from your local machine and navigate to${TOKEN}
and enter your token specified in your terminal.
Follow these steps to use custom dataset.
- Prepare a directory with .wav files, filelists (training/validation split of the data) with transcripts and paths to .wav files under the
location. Those filelists should list a single utterance per line as:
<audio file path>|<transcript>
- Preprocess the data
- Run the pre-processing script to calculate pitch and mels with
python \
--wav-text-filelists dataset/tts_data_train.txt \
dataset/tts_data_val.txt \
--n-workers 0 \
--batch-size 1 \
--dataset-path dataset \
--extract-pitch \
--f0-method pyin \
--extract-mels \
- Prepare file lists with paths to pre-calculated pitch running
Those filelists should list a single utterance per line as:
<audio file path>|<audio pitch .pt file path>|<transcript>
The complete dataset has the following structure:
├── mels
├── pitch
├── wavs
├── tts_data_train.txt
├── tts_data_val.txt
├── tts_data_pitch_train.txt
├── tts_data_pitch_val.txt