Reasonable learning rate range for adahessian? #21

sjscotti · 2021-09-05T18:07:53Z

Hi
For training a chatbot, I want to switch to adahessian from adam as the final step in fine-tuning of my model. I have a question about what is a reasonable learning rate to use for adahessian. For adam I used fairly small learning rates - starting at 2e-5 and reducing from there - which worked pretty well. However, as I understand it, adahessian preconditions the parameter update like an inverse Hessian does in a Newton step. But in a Newton step for a quadratic model, the ideal learning rate is 1.0. So I assume that I should be using a much larger learning rate for adahessian than I have been using for adam. Do you have any suggestions based on your experience?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasonable learning rate range for adahessian? #21

Reasonable learning rate range for adahessian? #21

sjscotti commented Sep 5, 2021

Reasonable learning rate range for adahessian? #21

Reasonable learning rate range for adahessian? #21

Comments

sjscotti commented Sep 5, 2021