You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm planning to add in adafactor optimizer used in the official implementation. The main benefit of this over adam +adamw is that we don't need 3x the vram but I think a bit above 2x the vram of the models. I currently have the code up https://github.com/isamu-isozaki/adafactor-pytorch and after adding a triton version, I will bring a pr to here!
The text was updated successfully, but these errors were encountered:
I'm planning to add in adafactor optimizer used in the official implementation. The main benefit of this over adam +adamw is that we don't need 3x the vram but I think a bit above 2x the vram of the models. I currently have the code up https://github.com/isamu-isozaki/adafactor-pytorch and after adding a triton version, I will bring a pr to here!
The text was updated successfully, but these errors were encountered: