README.md

Alpaca Finetuning with BigDL-LLM

This example ports Alpaca-LoRA to BigDL-LLM (using either QLoRA / QA-LoRA or LoRA algorithm) on Intel GPU.

0. Requirements

To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.

1. Install

conda create -n llm python=3.9
conda activate llm
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
# you can install specific ipex/torch version for your need
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install datasets transformers==4.34.0
pip install fire peft==0.5.0
pip install oneccl_bind_pt==2.0.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
pip install accelerate==0.23.0

2. Configures OneAPI environment variables

source /opt/intel/oneapi/setvars.sh

3. Finetune

Now we support three training modes (QLoRA / QA-LoRA / LoRA), to run different mode, just change training_mode to qlora / qalora / lora in below script.

Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device:

QLoRA

Finetuning LLaMA2-7B on single Arc A770

bash finetune_llama2_7b_arc_1_card.sh

Finetuning LLaMA2-7B on two Arc A770

bash finetune_llama2_7b_arc_2_card.sh

Finetuning LLaMA2-7B on single Data Center GPU Flex 170

bash finetune_llama2_7b_flex_170_1_card.sh

Finetuning LLaMA2-7B on three Data Center GPU Flex 170

bash finetune_llama2_7b_flex_170_3_card.sh

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1100

bash finetune_llama2_7b_pvc_1100_1_card.sh

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

bash finetune_llama2_7b_pvc_1100_4_card.sh

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550

bash finetune_llama2_7b_pvc_1550_1_card.sh

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

bash finetune_llama2_7b_pvc_1550_4_card.sh

QA-LoRA

Finetuning LLaMA2-7B on single Arc A770

bash qalora_finetune_llama2_7b_arc_1_card.sh

Finetuning LLaMA2-7B on two Arc A770

bash qalora_finetune_llama2_7b_arc_2_card.sh

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

bash qalora_finetune_llama2_7b_pvc_1550_1_tile.sh

LoRA

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

bash lora_finetune_llama2_7b_pvc_1100_1_card.sh

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

bash lora_finetune_llama2_7b_pvc_1550_1_tile.sh

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

bash lora_finetune_llama2_7b_pvc_1550_4_card.sh

4. (Optional) Resume Training

If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying resume_from_checkpoint to the local checkpoint folder as following:**

python ./alpaca_qlora_finetuning.py \
    --base_model "meta-llama/Llama-2-7b-hf" \
    --data_path "yahma/alpaca-cleaned" \
    --output_dir "./bigdl-qlora-alpaca" \
    --resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"

5. Sample Output

{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}                                                                                                                            
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}                                                                                                                            
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}                                                                                                                           
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}                                                                                                                             
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}                                                                                                                           
  1%|█                                                                                                                                                         | 8/1164 [xx:xx<xx:xx:xx, xx s/it]

4. Merge the adapter into the original model

python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged

Then you can use ./outputs/checkpoint-200-merged as a normal huggingface transformer model to do inference.

5. Troubleshooting

If you fail to finetune on multi cards because of following error message:
```
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
```
Please try sudo apt install level-zero-dev to fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpaca-qlora

alpaca-qlora

README.md

Alpaca Finetuning with BigDL-LLM

0. Requirements

1. Install

2. Configures OneAPI environment variables

3. Finetune

QLoRA

Finetuning LLaMA2-7B on single Arc A770

Finetuning LLaMA2-7B on two Arc A770

Finetuning LLaMA2-7B on single Data Center GPU Flex 170

Finetuning LLaMA2-7B on three Data Center GPU Flex 170

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

QA-LoRA

Finetuning LLaMA2-7B on single Arc A770

Finetuning LLaMA2-7B on two Arc A770

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

LoRA

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

4. (Optional) Resume Training

5. Sample Output

4. Merge the adapter into the original model

5. Troubleshooting

Name		Name	Last commit message	Last commit date
parent directory ..
templates		templates
utils		utils
README.md		README.md
alpaca_qlora_finetuning.py		alpaca_qlora_finetuning.py
finetune_llama2_7b_arc_1_card.sh		finetune_llama2_7b_arc_1_card.sh
finetune_llama2_7b_arc_2_card.sh		finetune_llama2_7b_arc_2_card.sh
finetune_llama2_7b_flex_170_1_card.sh		finetune_llama2_7b_flex_170_1_card.sh
finetune_llama2_7b_flex_170_3_card.sh		finetune_llama2_7b_flex_170_3_card.sh
finetune_llama2_7b_pvc_1100_1_card.sh		finetune_llama2_7b_pvc_1100_1_card.sh
finetune_llama2_7b_pvc_1100_4_card.sh		finetune_llama2_7b_pvc_1100_4_card.sh
finetune_llama2_7b_pvc_1550_1_card.sh		finetune_llama2_7b_pvc_1550_1_card.sh
finetune_llama2_7b_pvc_1550_4_card.sh		finetune_llama2_7b_pvc_1550_4_card.sh
lora_finetune_llama2_7b_pvc_1110_4_card.sh		lora_finetune_llama2_7b_pvc_1110_4_card.sh
lora_finetune_llama2_7b_pvc_1550_1_tile.sh		lora_finetune_llama2_7b_pvc_1550_1_tile.sh
lora_finetune_llama2_7b_pvc_1550_4_card.sh		lora_finetune_llama2_7b_pvc_1550_4_card.sh
qalora_finetune_llama2_7b_arc_1_card.sh		qalora_finetune_llama2_7b_arc_1_card.sh
qalora_finetune_llama2_7b_arc_2_card.sh		qalora_finetune_llama2_7b_arc_2_card.sh
qalora_finetune_llama2_7b_pvc_1550_1_card.sh		qalora_finetune_llama2_7b_pvc_1550_1_card.sh
qalora_finetune_llama2_7b_pvc_1550_1_tile.sh		qalora_finetune_llama2_7b_pvc_1550_1_tile.sh

Files

alpaca-qlora

Directory actions

More options

Directory actions

More options

Latest commit

History

alpaca-qlora

Folders and files

parent directory

README.md

Alpaca Finetuning with BigDL-LLM

0. Requirements

1. Install

2. Configures OneAPI environment variables

3. Finetune

QLoRA

Finetuning LLaMA2-7B on single Arc A770

Finetuning LLaMA2-7B on two Arc A770

Finetuning LLaMA2-7B on single Data Center GPU Flex 170

Finetuning LLaMA2-7B on three Data Center GPU Flex 170

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

QA-LoRA

Finetuning LLaMA2-7B on single Arc A770

Finetuning LLaMA2-7B on two Arc A770

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

LoRA

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100

Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550

Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550

4. (Optional) Resume Training

5. Sample Output

4. Merge the adapter into the original model

5. Troubleshooting