What is the min GPU memory required to fine-tune the model? #22

Ozawa333 · 2024-05-10T07:36:24Z

First of all, thank you very much for your work.

I try to train the model Gemma-2B 32K seq len with 2K segment size on a single A6000Ada 48G
But even if I adjust the parameters in train.gemma.infini.noclm.sh like the following, it still shows that the GPU memory is exceeded.
Is this normal?

accelerate launch --mixed_precision='bf16' \
    train.gemma.infini.noclm.py \
    --model_name_or_path='google/gemma-2b' \
    --segment_length=2048 \
    --block_size=32768 \
    --dataset_name='wikitext' \
    --dataset_config_name='wikitext-2-raw-v1' \
    --per_device_train_batch_size=1 \
    --per_device_eval_batch_size=1 \
    --weight_decay=1.0 \
    --output_dir='./models/gemma-2b-infini-noclm-wikitext' \
    --checkpointing_steps=10 \
    --num_train_epochs=1 \
    --learning_rate=5e-5 \
    --seed=42 \
    --low_cpu_mem_usage \
    --report_to='wandb' \
    --preprocessing_num_workers=64 \
    --with_tracking \

The text was updated successfully, but these errors were encountered:

Ozawa333 changed the title ~~What is the min GPU memory required to train the model?~~ What is the min GPU memory required to fine-tune the model? May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the min GPU memory required to fine-tune the model? #22

What is the min GPU memory required to fine-tune the model? #22

Ozawa333 commented May 10, 2024 •

edited

Loading

What is the min GPU memory required to fine-tune the model? #22

What is the min GPU memory required to fine-tune the model? #22

Comments

Ozawa333 commented May 10, 2024 • edited Loading

Ozawa333 commented May 10, 2024 •

edited

Loading