You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two GPUs with 24GB VRAM. By manually configuring the device_map, I can enable naive model parallelism to fine-tune a 72B quantized model using QLoRA on a short text dataset.
import os
os.environ['CUDA_VISIBLE_DEVICES'] = 0,1'
device_map = {}
device_map['model.embed_tokens'] = 0
for layer_idx in range(41):
device_map[f'model.layers.{layer_idx}'] = 0
for layer_idx in range(41, 80):
device_map[f'model.layers.{layer_idx}'] = 1
device_map['lm_head.weight'] = 1
device_map['model.norm.weight'] = 1
device_map['model.rotary_emb'] = 1
model = AutoModelForCausalLM.from_pretrained(./Qwen2-72B-Instruct-bnb-4bit,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map=device_map)
However, when dealing with slightly longer texts, I encounter OOM issues.
I tried using Unsloth, but it currently doesn’t support multi-GPU setups. It would be great if Unsloth plans to support naive model parallelism!
The text was updated successfully, but these errors were encountered:
I have two GPUs with 24GB VRAM. By manually configuring the device_map, I can enable naive model parallelism to fine-tune a 72B quantized model using QLoRA on a short text dataset.
However, when dealing with slightly longer texts, I encounter OOM issues.
I tried using Unsloth, but it currently doesn’t support multi-GPU setups. It would be great if Unsloth plans to support naive model parallelism!
The text was updated successfully, but these errors were encountered: