Skip to content

Latest commit

 

History

History
34 lines (31 loc) · 2.18 KB

README.md

File metadata and controls

34 lines (31 loc) · 2.18 KB

LoRA Fine-tuning of pruned LLMs

Here we provide the script for the lora fine-tuning experiments in the paper. The commands for reproducing our experiments are in script.sh.

This codebase is based on run_clm.py. Here we adapt this code with LoRA fine-tuning on the C4 training dataset. Some custom changes we make in the code include:

  • loc 1: set up LLaMA-7B for LoRA fine-tuning;
  • loc 2: set up training arguments for Trainer.
  • loc 3: load the tokenizer from vicuna, which are the same as the original LLaMA tokenizer but also fix the issues of some special tokens.
  • loc 4: load the c4 training dataset.

To train a LoRA adapter, run the command:

CUDA_VISIBLE_DEVICES=0 python finetune_lm.py \
    --model_name_or_path [PATH to load sparse pruned LLaMA-7B] \
    --config_name "decapoda-research/llama-7b-hf" \
    --dataset_name c4 \
    --num_train_epochs 1 \
    --block_size 1024 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 8 \
    --do_train \
    --do_eval \
    --max_train_samples 30000 \
    --max_eval_samples 128 \
    --learning_rate 1e-4 \
    --overwrite_output_dir \
    --output_dir [PATH to save the LoRA weights]

We provide a quick overview of the arguments:

  • --model_name_or_path: The path/directory where pruned LLaMA-7B are saved with model.save_pretrained(PATH).
  • --block_size: context size, if you have 80GB gpu, you can set it to 2048;
  • --max_train_samples: the number of training sequences, 30000 would lead to roughly 12 hours of training on 1 GPU;
  • --learning_rate: the learning rate for LoRA fine-tuning;

We provide the code to evaluate LoRA adapter on WikiText validation dataset in evaluate_ppl.py. For zero shot evaluation, additionally pass the --eval_zero_shot argument.