The goal of this project is to generate long and coherent sequences of data using Transformer architectures based on the following papers:
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- Stabilizing Transformers for Reinforcement Learning
- Music Transformer
- Character-Level Language Modeling with Deeper Self-Attention
The neural networks are tested on two separate tasks : music generation and text generation. All the models are implemented from scratch in Tensorflow 2.
Music Model | Text Model |
---|---|
The structure of the GTrXL (Gated Transformer XL) block is illustrated in detail below:
The architecture used for text generation is the one proposed in the paper Stabilizing Transformers for Reinforcement Learning. Music generation requires a modified model where the input features are split into MIDI events (note_on, note_off and control_change) and MIDI deltas (time periods between consecutive MIDI events).
For the task of music generation the union of the following datasets is used:
- The MAESTRO Dataset
- SMD MIDI-Audio Piano Music
- Stanford University Piano Roll Archive
- Classical Music ML Format
All of the above contain classical piano music in MIDI format. The MIDI files are preprocessed with the mido library.
As for the text generation, the CLAIR collection of "Nigerian" fraud emails is used.
Generated data for both datasets can be found here.
-
NumPy
-
Tensorflow
-
argparse
-
pathlib
-
tqdm
-
pickle
-
re
-
joblib
-
mido
-
glob
-
bs4
-
dload
python preprocess_music.py -d
python train_music.py
python generate_music.py <n_songs> <checkpoint path>
python preprocess_text.py <corpus path>
python train_text.py
python generate_text.py <n_samples> <checkpoint path>