Implementation with PyTorch.
- Base model
- LSTM using MFCC audio features
- CNN(ref simplified version) with LPC features
- Python3
- PyTorch v0.3.0
- numpy
- librosa & audiolazy
- scipy
- etc.
-
Scripts to run
main.py
: change net name and set checkpoints folder to train different modelstest_model.py
: generate blendshape sequences given extracted audio features (need audio features as input)synthesis.py
: generate blendshape directly from input wav (need arguements of input audio path)
-
Classes
models.py
: Classes with LSTM and CNN (simplified NvidiaNet) model.models_testae.py
: Advanced models with audoencoder design.dataset.py
: Class for loading dataset.
-
Input preprocessing
misc/audio_mfcc.py
: extract mfcc features from input wav filesmisc/audio_lpc.py
: extract lpc featuresmisc/combine.py
: combine certain audio feature/blendshape files to obtain a single file for data loading
To build your own dataset, you need to preprocess your wav/blendshape pairs with misc/audio_mfcc.py
or misc/audio_lpc.py
. Then combine those feature/blendshape files misc/combine.py
to a single feature/blendshape file.
Modify main.py
. Set model to the one you need and also specify checkpoint folder.
- Both
test_model.py
andsynthesis.py
can be used to generate blendshape sequences.test_model.py
accepts extrated audio features (MFCC/LPC).synthesis.py
takes raw wav file as input- State the arguments and it will produce a blenshape test file.