Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About alpha/rank in lora #1304

Open
Vital1162 opened this issue Nov 18, 2024 · 3 comments
Open

About alpha/rank in lora #1304

Vital1162 opened this issue Nov 18, 2024 · 3 comments

Comments

@Vital1162
Copy link

How does $\alpha$ in Lora affect performance in training?
I usually see everyone set to $2r$. But why?
About the rank, I always set it to 128-256 if the dataset quantity is good.

@Erland366
Copy link
Contributor

I think because in LoRA, you use alpha for the learning rate of the LoRA, which is defined by

$$ LR_{LoRA} = \frac{\alpha}{\sqrt{r}} \times LR $$

But in finetuning, you might want to aggresively update the adapter since your data is usually fewer than pretrain.

Probably my intuition is as long as the result of $\frac{\alpha}{\sqrt{r}}$ is more than one then you good to go

@Vital1162
Copy link
Author

Vital1162 commented Nov 19, 2024

thank you for your response @Erland366, but does dataset size affect these parameters?

@Erland366
Copy link
Contributor

I've heard in the Discord that if you have smaller dataset, then use smaller rank and alpha. But I haven't tested this a lot myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants