Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible for Unsloth to support naive model parallelism? #1305

Open
Songjw133 opened this issue Nov 18, 2024 · 0 comments
Open

Is it possible for Unsloth to support naive model parallelism? #1305

Songjw133 opened this issue Nov 18, 2024 · 0 comments

Comments

@Songjw133
Copy link

Songjw133 commented Nov 18, 2024

I have two GPUs with 24GB VRAM. By manually configuring the device_map, I can enable naive model parallelism to fine-tune a 72B quantized model using QLoRA on a short text dataset.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = 0,1'

device_map = {}
device_map['model.embed_tokens'] = 0
for layer_idx in range(41):
    device_map[f'model.layers.{layer_idx}'] = 0
for layer_idx in range(41, 80):
    device_map[f'model.layers.{layer_idx}'] = 1
device_map['lm_head.weight'] = 1
device_map['model.norm.weight'] = 1
device_map['model.rotary_emb'] = 1
model = AutoModelForCausalLM.from_pretrained(./Qwen2-72B-Instruct-bnb-4bit,
                                             trust_remote_code=True,
                                             torch_dtype=torch.bfloat16,
                                             device_map=device_map)

However, when dealing with slightly longer texts, I encounter OOM issues.

I tried using Unsloth, but it currently doesn’t support multi-GPU setups. It would be great if Unsloth plans to support naive model parallelism!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant