- (OpenAI Blog 2019) Language Models are Unsupervised Multitask Learners
- (NIPS 2017) Attention Is All You Need
- (ICLR 2015) Neural Machine Translation by Jointly Learning to Align and Translate
- (NIPS 2014) Sequence to Sequence Learning with Neural Networks
- (Neural Computation 1997) Long Short-Term Memory
- Paper / Article / Code / PyTorch Documentation
- (1986) Recurrent Neural Network