QLoRA: How to set rank + How to target Q,K,V,O proj layers in LoRA? Flags? #720

mark-lord · 2024-02-21T23:24:58Z

mark-lord
Feb 21, 2024

Hi all!

TL;DR, Question is basically the title - are there any flags for targeting as many projectors as possible and setting a super high rank?

I'm a recent onboardee to MLX - having previously been doing a bunch of fine-tuning on Google Colab using this really great software Unsloth. I'm currently researching if knowledge injection is possible without in-context learning using consumer-grade hardware (so far I've achieved some really promising results!!)

Disclaimer: I'm out of my depth in that I don't know what a lot of these things mean. But I was told by someone in the community that for knowledge injection you've got to train all layers, with all projectors, and with a really high rank if you're using QLoRA (I was doing r = 256). I asked the resident MLX Guru if there were any flags but alas, the Guru did not know (it didn't even think that MLX supported QLoRA... here's the link to my chat if the curator of that GPT wants to debug it)

In any case I hope that I'm not posting this to the wrong place - I'm not really a Github native, and I only picked up fine-tuning last weekend, so really sorry if any of these are stupid questions! (Like I'm thinking that this whole approach might be really dumb - if I'm trying to crank the rank up really high to 256... should I even be using low rank adaptation in the first place?)

P.S. Super appreciate what the community is doing with this; being able to fine-tune on my Mac means I can do this research without worrying about spending money on compute that I could've spent on taking my partner out for a nice lunch 😂

Answered by awni

Feb 22, 2024

I asked the resident MLX Guru if there were any flags but alas, the Guru did not know (it didn't even think that MLX supported QLoRA.

Lol yea I guess the Guru needs an update (CC @sck-at-ucy).

are there any flags for targeting as many projectors as possible and setting a super high rank?

Sort of but not really. RIght now you can make all the layer lora layers with --lora-layers 32 (if you're model has 32 blocks)

The other flags you have to change manually. Though it's on our TODO list to make them configurable with a config file. If you prefer not to wait, you can change code manually:

Change the default rank here
Change the LorA layers here. Basically add more lines there which incl…

View full answer

awni · 2024-02-22T15:14:54Z

awni
Feb 22, 2024
Maintainer

I asked the resident MLX Guru if there were any flags but alas, the Guru did not know (it didn't even think that MLX supported QLoRA.

Lol yea I guess the Guru needs an update (CC @sck-at-ucy).

are there any flags for targeting as many projectors as possible and setting a super high rank?

Sort of but not really. RIght now you can make all the layer lora layers with --lora-layers 32 (if you're model has 32 blocks)

The other flags you have to change manually. Though it's on our TODO list to make them configurable with a config file. If you prefer not to wait, you can change code manually:

Change the default rank here
Change the LorA layers here. Basically add more lines there which include the MLP weights as well (l.mlp.gate_proj, l.mlp.down_proj, l.mlp.up_proj)

8 replies

sck-at-ucy Feb 22, 2024

I updated the knowledge base to v 0.3.0 but still does not know about QLoRA. Realized that my scripts for scraping the docs do not capture the examples under https://github.com/ml-explore/mlx-examples because they are formatted differently than the docs in https://ml-explore.github.io/mlx/build/html/python/ so I need to update my strategy to capture them as well. If it makes the Guru more useful for you I can add the LoRA and QLoRA pages manually for now.

mark-lord Feb 22, 2024
Author

I'd appreciate that 😄 Though my main question had been on how to change the rank and layers, which Awni already covered. But I might have further questions later on? Either way, definitely not urgent by any means. Thanks sck-at-ucy 😊

mark-lord Feb 23, 2024
Author

@awni So if I wanted to implement this correctly to have qkvo,gate,up,down, it should look something like this?

    for l in model.model.layers[len(model.model.layers) - num_lora_layers :]:
        l.self_attn.q_proj = LoRALinear.from_linear(l.self_attn.q_proj)
        l.self_attn.v_proj = LoRALinear.from_linear(l.self_attn.v_proj)
        l.self_attn.k_proj = LoRALinear.from_linear(l.self_attn.k_proj)
        l.self_attn.o_proj = LoRALinear.from_linear(l.self_attn.o_proj)
        l.mlp.gate_proj = LoRALinear.from_linear(l.mlp.gate_proj)
        l.mlp.down_proj = LoRALinear.from_linear(l.mlp.down_proj)
        l.mlp.up_proj = LoRALinear.from_linear(l.mlp.up_proj)

mark-lord Feb 23, 2024
Author

Tried running, got this error:

Loading pretrained model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/marklord/Documents/Fine-tuning/MLX/venv/lib/python3.11/site-packages/mlx_lm/lora.py", line 175, in <module>
    linear_to_lora_layers(model, args.lora_layers)
  File "/Users/marklord/Documents/Fine-tuning/MLX/venv/lib/python3.11/site-packages/mlx_lm/tuner/utils.py", line 43, in linear_to_lora_layers
    l.mlp.gate_proj = LoRALinear.from_linear(l.mlp.gate_proj)
                                             ^^^^^
  File "/Users/marklord/Documents/Fine-tuning/MLX/venv/lib/python3.11/site-packages/mlx/nn/layers/base.py", line 101, in __getattr__
    raise AttributeError(f"{type(self)!r} has no attribute {key!r}")
AttributeError: <class 'mlx_lm.models.mixtral.MixtralDecoderLayer'> has no attribute 'mlp'

Oop, this might be me having two versions of the model downloaded (again). Deleted folder and doing fresh download of Mixtral. Will update in a bit

Edit: Sadly no, even fresh install of mixtral has no attribute 'mlp'. Got Q,K,V,O working though

awni Feb 23, 2024
Maintainer

I responded to this in your other discussion #732

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA: How to set rank + How to target Q,K,V,O proj layers in LoRA? Flags? #720

{{title}}

Replies: 1 comment 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

QLoRA: How to set rank + How to target Q,K,V,O proj layers in LoRA? Flags? #720

mark-lord Feb 21, 2024

Replies: 1 comment · 8 replies

awni Feb 22, 2024 Maintainer

sck-at-ucy Feb 22, 2024

mark-lord Feb 22, 2024 Author

mark-lord Feb 23, 2024 Author

mark-lord Feb 23, 2024 Author

awni Feb 23, 2024 Maintainer

mark-lord
Feb 21, 2024

Replies: 1 comment 8 replies

awni
Feb 22, 2024
Maintainer

mark-lord Feb 22, 2024
Author

mark-lord Feb 23, 2024
Author

mark-lord Feb 23, 2024
Author

awni Feb 23, 2024
Maintainer