You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
For training a chatbot, I want to switch to adahessian from adam as the final step in fine-tuning of my model. I have a question about what is a reasonable learning rate to use for adahessian. For adam I used fairly small learning rates - starting at 2e-5 and reducing from there - which worked pretty well. However, as I understand it, adahessian preconditions the parameter update like an inverse Hessian does in a Newton step. But in a Newton step for a quadratic model, the ideal learning rate is 1.0. So I assume that I should be using a much larger learning rate for adahessian than I have been using for adam. Do you have any suggestions based on your experience?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi
For training a chatbot, I want to switch to adahessian from adam as the final step in fine-tuning of my model. I have a question about what is a reasonable learning rate to use for adahessian. For adam I used fairly small learning rates - starting at 2e-5 and reducing from there - which worked pretty well. However, as I understand it, adahessian preconditions the parameter update like an inverse Hessian does in a Newton step. But in a Newton step for a quadratic model, the ideal learning rate is 1.0. So I assume that I should be using a much larger learning rate for adahessian than I have been using for adam. Do you have any suggestions based on your experience?
Thanks!
The text was updated successfully, but these errors were encountered: