This is a list of permissively licensed language models with MIT, Apache 2.0, or other similar licenses. We are using the term language model broadly here to include not only autoregressive models but also models that were trained with different objectives such as MLM.
This work was mostly inspired by Stella Biderman's Directory of Generative AI, and The Foundation Model Development Cheatsheet. But unlike these two very comprehensive sources, this work is meant to be a quick and more focused reference.
- 👑: Model + Data + Code
- ⭐: Model + Data
- ⚡: Model + Code
Important
This is still a work in progress. Contributions, corrections, and feedback are very welcome!
Model | Parameters | Architecture | Encoder | Decoder | MoE | Year | Hugging Face | License |
---|---|---|---|---|---|---|---|---|
GPT-1 | 120M | Transformer | - | ✅ | - | 2018 | 🤗 | MIT |
BERT-Base-Cased | 110M | Transformer | ✅ | - | - | 2018 | 🤗 | Apache 2.0 |
BERT-Base-Uncased | 110M | Transformer | ✅ | - | - | 2018 | 🤗 | Apache 2.0 |
BERT-Large-Cased | 340M | Transformer | ✅ | - | - | 2018 | 🤗 | Apache 2.0 |
BERT-Large-Uncased | 340M | Transformer | ✅ | - | - | 2018 | 🤗 | Apache 2.0 |
GPT-2-Small | 124M | Transformer | - | ✅ | - | 2019 | 🤗 | MIT |
GPT-2-Medium | 355M | Transformer | - | ✅ | - | 2019 | 🤗 | MIT |
GPT-2-Large | 774M | Transformer | - | ✅ | - | 2019 | 🤗 | MIT |
GPT-2-XL | 1.5B | Transformer | - | ✅ | - | 2019 | 🤗 | MIT |
T5-Small👑 | 60M | Transformer | ✅ | ✅ | - | 2019 | 🤗 | Apache 2.0 |
T5-Base👑 | 220M | Transformer | ✅ | ✅ | - | 2019 | 🤗 | Apache 2.0 |
T5-Large👑 | 770M | Transformer | ✅ | ✅ | - | 2019 | 🤗 | Apache 2.0 |
T5-3B👑 | 3B | Transformer | ✅ | ✅ | - | 2019 | 🤗 | Apache 2.0 |
T5-11B👑 | 11B | Transformer | ✅ | ✅ | - | 2019 | 🤗 | Apache 2.0 |
XLM-RoBERTa-Large | 560M | Transformer | ✅ | - | - | 2019 | 🤗 | MIT |
XLM-RoBERTa-Base | 250M | Transformer | ✅ | - | - | 2019 | 🤗 | MIT |
RoBERTa-Base | 125M | Transformer | ✅ | - | - | 2019 | 🤗 | MIT |
RoBERTa-Large | 355M | Transformer | ✅ | - | - | 2019 | 🤗 | MIT |
DistilBERT-Base-Cased | 66M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
DistilBERT-Base-Uncased | 66M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
ALBERT-Base | 12M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
ALBERT-Large | 18M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
ALBERT-XLarge | 60M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
ALBERT-XXLarge | 235M | Transformer | ✅ | - | - | 2019 | 🤗 | Apache 2.0 |
DeBERTa-Base | 134M | Transformer | ✅ | - | - | 2020 | 🤗 | MIT |
DeBERTa-Large | 350M | Transformer | ✅ | - | - | 2020 | 🤗 | MIT |
DeBERTa-XLarge | 750M | Transformer | ✅ | - | - | 2020 | 🤗 | MIT |
ELECTRA-Small-Discriminator | 14M | Transformer | ✅ | - | - | 2020 | 🤗 | Apache 2.0 |
ELECTRA-Base-Discriminator | 110M | Transformer | ✅ | - | - | 2020 | 🤗 | Apache 2.0 |
ELECTRA-Large-Discriminator | 335M | Transformer | ✅ | - | - | 2020 | 🤗 | Apache 2.0 |
GPT-Neo-125M👑 | 125M | Transformer | - | ✅ | - | 2021 | 🤗 | MIT |
GPT-Neo-1.3B👑 | 1.3B | Transformer | - | ✅ | - | 2021 | 🤗 | MIT |
GPT-Neo-2.7B👑 | 2.7B | Transformer | - | ✅ | - | 2021 | 🤗 | MIT |
GPT-J👑 | 6B | Transformer | - | ✅ | - | 2021 | 🤗 | Apache 2.0 |
XLM-RoBERTa-XL | 3.5B | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
XLM-RoBERTa-XXL | 10.7B | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v2-XLarge | 900M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v2-XXLarge | 1.5M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v3-XSmall | 22M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v3-Small | 44M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v3-Base | 86M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
DeBERTa-v3-Large | 304M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
mDeBERTa-v3-Base | 86M | Transformer | ✅ | - | - | 2021 | 🤗 | MIT |
GPT-NeoX👑 | 20B | Transformer | - | ✅ | - | 2022 | 🤗 | Apache 2.0 |
UL2👑 | 20B | Transformer | ✅ | ✅ | - | 2022 | 🤗 | Apache 2.0 |
YaLM⚡ | 100B | Transformer | - | ✅ | - | 2022 | 🤗 | Apache 2.0 |
Pythia-14M👑 | 14M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-70M👑 | 70M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-160M👑 | 160M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-410M👑 | 410M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-1B👑 | 1B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-1.4B👑 | 1.4B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-2.8B👑 | 2.8B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-6.9B👑 | 6.9B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Pythia-12B👑 | 12B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-111M⭐ | 111M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-256M⭐ | 256M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-590M⭐ | 590M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-1.3B⭐ | 1.3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-2.7B⭐ | 2.7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-6.7B⭐ | 6.7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Cerebras-GPT-13B⭐ | 13B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
BTLM👑 | 3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Phi-1 | 1.3B | Transformer | - | ✅ | - | 2023 | 🤗 | MIT |
Phi-1.5 | 1.3B | Transformer | - | ✅ | - | 2023 | 🤗 | MIT |
Phi-2 | 2.7B | Transformer | - | ✅ | - | 2023 | 🤗 | MIT |
RedPajama-INCITE-3B👑 | 2.8B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
RedPajama-INCITE-7B👑 | 6.9B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
FLM | 101B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
MPT-1B | 1.3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
MPT-7B | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
MPT-7B-8K | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
MPT-30B | 30B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mistral-7B-v0.1 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mistral-7B-v0.2 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mistral-7B-v0.3 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Falcon-1B | 1B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Falcon-7B | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Falcon-40B | 40B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
TinyLlama | 1.1B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenLLaMA-3B-v1👑 | 3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenLLaMA-7B-v1👑 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenLLaMA-13B-v1👑 | 13B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenLLaMA-3B-v2👑 | 3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenLLaMA-7B-v2👑 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
DeciLM-7B | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Amber👑 | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Solar | 10.7B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mixtral-8x7B | 46.7B | Transformer | - | ✅ | ✅ | 2023 | 🤗 | Apache 2.0 |
OpenMoE-base-128B | 637M | Transformer | - | ✅ | ✅ | 2023 | 🤗 | Apache 2.0 |
Mamba-130M | 130M | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mamba-370M | 370M | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mamba-790M | 790M | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mamba-1.4B | 1.4M | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mamba-2.8B | 2.8B | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Mamba-2.8B-slimpj | 2.8B | SSM | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
OpenBA | 15B | Transformer | ✅ | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Yi-6B | 6B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Yi-6B-200K | 6B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Yi-9B | 9B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Yi-9B-200K | 9B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Yi-34B-200K | 34B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Persimmon-8B | 8B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Palmyra-3B | 3B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Palmyra-Small-128M | 128M | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Palmyra-Base-5B | 5B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
Palmyra-Large-20B | 20B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
SEA-LION-3B | 3B | Transformer | - | ✅ | - | 2023 | 🤗 | MIT |
SEA-LION-7B | 7B | Transformer | - | ✅ | - | 2023 | 🤗 | MIT |
PLaMo-13B | 13B | Transformer | - | ✅ | - | 2023 | 🤗 | Apache 2.0 |
LiteLlama | 460M | Transformer | - | ✅ | - | 2024 | 🤗 | MIT |
H2O-Danube | 1.8B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
H2O-Danube2 | 1.8B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Cosmo | 1.8B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
MobiLlama-0.5B | 0.5B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
MobiLlama-0.8B | 0.8B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
MobiLlama-1B | 1.2B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
OLMo-1B👑 | 1B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
OLMo-7B👑 | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
OLMo-7B-Twin-2T👑 | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
OLMo-1.7-7B👑 | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Poro | 34B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Grok-1 | 314B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoe-8b-1.1T | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-8B-1T | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-8B-800B | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-8B-600B | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-8B-400B | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-8B-200B | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
OpenMoE-34B-200B | 34B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Jamba | 52B | SSM-Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
JetMoE | 8B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Mambaoutai | 1.6B | SSM | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Tele-FLM | 52B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Arctic-Base | 480B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Zamba-7B | 7B | SSM-Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Mixtral-8x22B-v0.1 | 141B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Granite-7b-base | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Chuxin-1.6B-Base👑 | 1.6B | Transformer | - | ✅ | - | 2024 | 🤗 | MIT |
Chuxin-1.6B-1M👑 | 1.6B | Transformer | - | ✅ | - | 2024 | 🤗 | MIT |
Neo👑 | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Yi-1.5-6B | 6B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Yi-1.5-9B | 9B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Yi-1.5-34B | 34B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
GECKO-7B | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Qwen2-0.5B | 0.5B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Qwen2-1.5B | 1.5B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Qwen2-7B | 7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Qwen2-57B-A14B | 57B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
K2👑 | 65B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Pile-T5-Base👑 | 248M | Transformer | ✅ | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Pile-T5-Large👑 | 783M | Transformer | ✅ | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Pile-T5-XL👑 | 2.85B | Transformer | ✅ | ✅ | - | 2024 | 🤗 | Apache 2.0 |
SmolLM-135M👑 | 135M | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
SmolLM-360M👑 | 360M | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
SmolLM-1.7B👑 | 1.7B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
GRIN | 42B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | MIT |
OLMoE-1B-7B👑 | 7B | Transformer | - | ✅ | ✅ | 2024 | 🤗 | Apache 2.0 |
Zamba2-1.2B | 1.2B | SSM-Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Zamba2-2.7B | 2.7B | SSM-Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
Fox-1-1.6B | 1.6B | Transformer | - | ✅ | - | 2024 | 🤗 | Apache 2.0 |
- [Blog post] What "Open" Means: A great blog post by John Shaughnessy discussing how the many different incarnations of the word "open".
- [Paper] Towards a Framework for Openness in Foundation Models: In this paper, Mozilla and Columbia Institute of Global Politics brought together over 40 leading scholars and practitioners working on openness and AI to discuss the highly debated definitions and benefits of open sourcing foundation models. Among this team are Victor Storchan, Yann LeCun, Justine Tunney, Nathan Lambert, and many others.
- [Paper] Rethinking open source generative AI: This paper surveys over 45 generative AI models using an evidence-based framework that distinguishes 14 dimensions of openness, from training datasets to scientific and technical documentation and from licensing to access methods.
- [Paper] Risks and Opportunities of Open-Source Generative AI: This paper analyzes the risks and opportunities of open-source generative AI models using a three-stage framework for Gen AI development (near, mid and long-term), and argues that, overall, the benefits of open-source Gen AI outweigh its risks.
@misc{hamdy2024openlmlist,
title = {The Open Language Models List},
author = {Mohammed Hamdy},
url = {https://github.com/mmhamdy/open-language-models},
year = {2024},
}