Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasonable learning rate range for adahessian? #21

Open
sjscotti opened this issue Sep 5, 2021 · 0 comments
Open

Reasonable learning rate range for adahessian? #21

sjscotti opened this issue Sep 5, 2021 · 0 comments

Comments

@sjscotti
Copy link

sjscotti commented Sep 5, 2021

Hi
For training a chatbot, I want to switch to adahessian from adam as the final step in fine-tuning of my model. I have a question about what is a reasonable learning rate to use for adahessian. For adam I used fairly small learning rates - starting at 2e-5 and reducing from there - which worked pretty well. However, as I understand it, adahessian preconditions the parameter update like an inverse Hessian does in a Newton step. But in a Newton step for a quadratic model, the ideal learning rate is 1.0. So I assume that I should be using a much larger learning rate for adahessian than I have been using for adam. Do you have any suggestions based on your experience?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant