Skip to content

Spectraformer a unified random feature framework for transformer for approximating and learning the kernel function in linearized attention of the Transformer

License

Notifications You must be signed in to change notification settings

cruiseresearchgroup/spectraformer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectraformer

FeaturesInstallationUsageAlgorithmsLicense

Spectraformer a unified random feature framework for transformer for approximating and learning the kernel function in linearized attention of the Transformer. It allows for the combination of any weight matrix with any component function. This repository is the official implementation of Spectraformer

spectraformer framework

Features

Spectraformer evaluates different combinations of weight matrices and component functions in the Transformer in three textual tasks in the LRA benchmark.

The component functions we currently cover are checked by green ticks

spectraformer component functions

The weight matrices we currently cover are checked by green ticks

spectraformer weight matrices

Installation

Preparing the Code

To install requirements in a conda environment:

conda create -y -n spectraformer python=3.12
conda activate spectraformer
conda install torchquad -c conda-forge
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Note: Specific requirements for data preprocessing are not included here.

Preparing the Dataset

Processed files can be downloaded here, or processed with the following steps:

  1. Requirements
tensorboard>=2.3.0
tensorflow>=2.3.1
tensorflow-datasets>=4.0.1
  1. Download the TFDS files for pathfinder and then set _PATHFINER_TFDS_PATH to the unzipped directory (following google-research/long-range-arena#11)
  2. Download lra_release.gz (7.7 GB).
  3. Unzip lra-release and put under ./data/.
cd data
wget https://storage.googleapis.com/long-range-arena/lra_release.gz
tar zxvf lra-release.gz 
  1. Create a directory lra_processed under ./data/.
mkdir lra_processed
cd ..

6.The directory structure would be (assuming the root dir is code)

./data/lra-processed
./data/long-range-arena-main
./data/lra_release
  1. Create train, dev, and test dataset pickle files for each task.
cd preprocess
python create_pathfinder.py
python create_listops.py
python create_retrieval.py
python create_text.py
python create_cifar10.py

Note: most source code comes from LRA repo.

Usage

Modify the configuration in config.py and run

python main.py --mode train --attn skyformer --task lra-text
  • mode: train, eval
  • attn: softmax, nystrom, linformer, reformer, perfromer, informer, bigbird, kernelized, skyformer
  • feat: trigrf, posrf, oprf, gerf, saderf
  • kernel_type: gaus, orf, scrf, sorf, rom, sgq, qmc, mm, fastfood_fixed, fastfood_learnable
  • task: lra-listops, lra-pathfinder, lra-retrieval, lra-text, lra-image

To run experiments on GCP

pip install --upgrade google-cloud-storage

python main.py --mode eval --attn skyformer --task lra-text --bucket_name kernelized-transformer-code --blob_path kernelized-transformer/data/lra_processed

python main.py --mode eval --random 792 --attn performer --feat orf --kernel_type geomrf --task glue-cola

Algorithms

Accuracy (%) Time (hour) Memory (GB)
Paper Model L T R Mu L T R Mu L T R Mu
Choromanski et al. 2021 PosRF-ORF 37.35 61.60 80.53 59.83 1.30 2.84 2.89 2.34 1.17 2.31 2.10 1.86
PosRF-SORF 22.98 63.31 65.52 50.60 1.29 2.81 2.83 2.31 1.17 2.31 2.10 1.86
PosRF-QMC 37.50 60.41 80.71 59.54 1.30 2.84 2.89 2.34 1.17 2.31 2.10 1.86
PosRF-SGQ 37.45 62.68 78.37 59.50 1.30 2.83 2.89 2.34 1.17 2.31 2.10 1.86
This paper PosRF-MM 38.05 61.85 80.67 60.19 1.31 2.84 2.89 2.35 1.17 2.31 2.10 1.86
Chowdhury et al. 2022 PosRF-FastFoodL 24.95 64.76 76.37 55.36 2.69 5.59 5.61 4.63 0.78 1.56 1.53 1.29
Likhosherstov et al. 2022 OPRF-ORF 37.50 59.35 80.90 59.25 1.61 3.45 3.50 2.86 1.36 2.71 2.56 2.21
OPRF-SORF 32.11 64.34 77.47 57.97 1.57 3.37 3.41 2.78 1.36 2.71 2.56 2.21
OPRF-QMC 38.41 60.32 80.80 59.84 1.61 3.46 3.51 2.86 1.36 2.71 2.56 2.21
OPRF-MM 38.71 60.39 80.45 59.85 1.61 3.46 3.51 2.86 1.36 2.71 2.56 2.21
OPRF-SGQ 22.53 61.34 79.29 54.39 1.60 3.45 3.47 2.84 1.36 2.71 2.56 2.21
OPRF-FastFoodL 37.40 64.04 78.32 59.92 2.81 5.84 5.86 4.84 0.85 1.69 1.67 1.40
Likhosherstov et al. 2023 SADERF-ORF 37.35 61.37 80.87 59.86 1.61 3.52 3.58 2.90 1.44 2.86 2.69 2.33
SADERF-SORF 32.31 64.49 76.43 57.74 1.58 3.44 3.49 2.84 1.44 2.86 2.69 2.33
SADERF-QMC 37.70 59.89 80.65 59.41 1.61 3.52 3.58 2.90 1.44 2.86 2.69 2.33
SADERF-MM 38.00 60.80 80.48 59.76 1.61 3.52 3.58 2.90 1.44 2.86 2.69 2.33
SADERF-SGQ 36.79 63.55 77.31 59.22 1.61 3.52 3.57 2.90 1.44 2.86 2.69 2.33
SADERF-FastFoodL 28.63 64.64 77.61 56.96 2.81 5.89 5.92 4.87 0.92 1.83 1.79 1.51

References

Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, and Adrian Weller. Rethinking attention with performers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ua6zuk0WRH.

Sankalan Pal Chowdhury, Adamos Solomou, Kumar Avinava Dubey, and Mrinmaya Sachan. Learning the transformer kernel. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=tLIBAEYjcv.

Valerii Likhosherstov, Krzysztof M Choromanski, Kumar Avinava Dubey, Frederick Liu, Tamas Sarlos, and Adrian Weller. Chefs'random tables: Non-trigonometric random features. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 34559–34573. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/df2d62b96a4003203450cf89cd338bb7-Paper-Conference.pdf.

Valerii Likhosherstov, Krzysztof Marcin Choromanski, Kumar Avinava Dubey, Frederick Liu, Tamas Sarlos, and Adrian Weller. Dense-exponential random features: Sharp positive estimators of the gaussian kernel. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=S0xrBMFihS.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

About

Spectraformer a unified random feature framework for transformer for approximating and learning the kernel function in linearized attention of the Transformer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.9%
  • Cuda 1.9%
  • C++ 0.2%