Is it possible for Unsloth to support naive model parallelism? #1305

Songjw133 · 2024-11-18T14:04:41Z

I have two GPUs with 24GB VRAM. By manually configuring the device_map, I can enable naive model parallelism to fine-tune a 72B quantized model using QLoRA on a short text dataset.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = 0,1'

device_map = {}
device_map['model.embed_tokens'] = 0
for layer_idx in range(41):
    device_map[f'model.layers.{layer_idx}'] = 0
for layer_idx in range(41, 80):
    device_map[f'model.layers.{layer_idx}'] = 1
device_map['lm_head.weight'] = 1
device_map['model.norm.weight'] = 1
device_map['model.rotary_emb'] = 1
model = AutoModelForCausalLM.from_pretrained(./Qwen2-72B-Instruct-bnb-4bit,
                                             trust_remote_code=True,
                                             torch_dtype=torch.bfloat16,
                                             device_map=device_map)

However, when dealing with slightly longer texts, I encounter OOM issues.

I tried using Unsloth, but it currently doesn’t support multi-GPU setups. It would be great if Unsloth plans to support naive model parallelism!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible for Unsloth to support naive model parallelism? #1305

Is it possible for Unsloth to support naive model parallelism? #1305

Songjw133 commented Nov 18, 2024 •

edited

Loading

Is it possible for Unsloth to support naive model parallelism? #1305

Is it possible for Unsloth to support naive model parallelism? #1305

Comments

Songjw133 commented Nov 18, 2024 • edited Loading

Songjw133 commented Nov 18, 2024 •

edited

Loading