GitHub - lucidrains/nGPT-pytorch: Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI

nGPT (normalized GPT) - Pytorch

Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI. The question is whether there is any loss of expressivity they swept under the rug, but I'll take it with good faith.

This type of network should also be studied in the context of continual learning and loss of plasticity

Adaptation to vision transformers is here

Install

$ pip install nGPT-pytorch

Usage

import torch
from nGPT_pytorch import nGPT

model = nGPT(
    num_tokens = 256,
    dim = 512,
    depth = 4,
    attn_norm_qk = True
)

x = torch.randint(0, 256, (2, 2048))

loss = model(x, return_loss = True)
loss.backward()

logits = model(x) # (2, 2048, 256)

Test

Enwik8

$ python train.py

Citations

@inproceedings{Loshchilov2024nGPTNT,
    title   = {nGPT: Normalized Transformer with Representation Learning on the Hypersphere},
    author  = {Ilya Loshchilov and Cheng-Ping Hsieh and Simeng Sun and Boris Ginsburg},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:273026160}
}

@article{Luo2017CosineNU,
    title     = {Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks},
    author    = {Chunjie Luo and Jianfeng Zhan and Lei Wang and Qiang Yang},
    journal   = {ArXiv},
    year      = {2017},
    volume    = {abs/1702.05870},
    url       = {https://api.semanticscholar.org/CorpusID:1505432}
}

@inproceedings{Zhou2024ValueRL,
    title   = {Value Residual Learning For Alleviating Attention Concentration In Transformers},
    author  = {Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:273532030}
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
data		data
nGPT_pytorch		nGPT_pytorch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ngpt-table1.png		ngpt-table1.png
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nGPT (normalized GPT) - Pytorch

Install

Usage

Test

Citations

About

Releases 43

Packages

Languages

License

lucidrains/nGPT-pytorch

Folders and files

Latest commit

History

Repository files navigation

nGPT (normalized GPT) - Pytorch

Install

Usage

Test

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 43

Packages 0

Languages

Packages