You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On long training runs it's required to checkpoint models in case something goes wrong part way through. When only saving the model, and resetting the optimizer state, training takes much longer to resume. It would make sense therefore to save and load the optimizer as well. I believe this can be implemented with the same npz architecture as is currently implemented for TensorCollection modules.
The text was updated successfully, but these errors were encountered:
On long training runs it's required to checkpoint models in case something goes wrong part way through. When only saving the model, and resetting the optimizer state, training takes much longer to resume. It would make sense therefore to save and load the optimizer as well. I believe this can be implemented with the same npz architecture as is currently implemented for TensorCollection modules.
The text was updated successfully, but these errors were encountered: