You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case when our dataset is super large, and we want to let the model walk through the dataset without replacement, may only for one or few epochs.
We can't do the training with oneshot due to time limition wall for each job. We need to add support to let the model dataloader recover from certain iter (within one epoch)
Solution
open_clip has give a solution that slice all shards into many sub set. And for each "sub_epoch" it walk through one sub set. Record our sub_epoch number and use it when start training to do the data checkpoint. mlfoundations/open_clip#535
The text was updated successfully, but these errors were encountered:
Description
In the case when our dataset is super large, and we want to let the model walk through the dataset without replacement, may only for one or few epochs.
We can't do the training with oneshot due to time limition wall for each job. We need to add support to let the model dataloader recover from certain iter (within one epoch)
Solution
open_clip has give a solution that slice all shards into many sub set. And for each "sub_epoch" it walk through one sub set. Record our sub_epoch number and use it when start training to do the data checkpoint.
mlfoundations/open_clip#535
The text was updated successfully, but these errors were encountered: