List of Suggested papers for nlp and computational biology beginners. The list is compiled from
- Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint. [pdf] [project] (GPT)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. NAACL 2019. [pdf] [code & model]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint. [pdf] [code & model]
- Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. Preprint. [pdf] [code] (GPT-2)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. Preprint. [pdf] [code & model] (T5)
- Language Models are Few-Shot Learners, Tom B. Brown, et al.. Preprint. [pdf]
- Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergu. Preprint. [pdf] [code&model] (ESM-1b)
- Language Models Enable Zero-shot Prediction of the Effects of Mutations on Protein Function Joshua Meier Roshan Rao,Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives. Preprint. [pdf][code&model] (ESM-1v)
- MSA Transformer Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, Rob Fergu.[pdf][code&model]
- Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction. Zeming Lin, Halil Akin1, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos SantosCosta, Maryam Fazel-Zarandi1, Tom Sercu1, Sal Candido1, Alexander Rives. Preprint. [pdf] [code & model] (ESM-2)
- Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions Jiayang Chen, Zhihang Hu, Siqi Sun, Qingxiong Tan, Yixuan Wang, Qinze Yu,Licheng Zong, Liang Hong, Jin Xiao, Tao Shen, Irwin King, and Yu L. Preprint. [pdf] [code&model] (RNA-FM)
- Highly Accurate Protein Structure Prediction with Alphafold. John Jumper, et al. Nature volume 596, pages 583–589 (2021) [pdf] [code & model] (AlphaFold2)