You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I was running codebook training on VAS, but for some reason I see the loss turning into nan after the first epoch. I was wondering if I may be doing something incorrectly? I used this command:
Hi, @jhyau. Thanks a lot for letting me know about it!
I think I could replicate the same problem that you and jwliu-cc have. I added a post to that issue and reset the changes at the cost of having this nasty bug in the code base.
I will close this one because it is merely the consequence of that issue.
Hello! I was running codebook training on VAS, but for some reason I see the loss turning into nan after the first epoch. I was wondering if I may be doing something incorrectly? I used this command:
Here are the nans I see:
Epoch 0: 51%|██████████████████████████████▉ | 78/154 [01:55<01:52, 1.48s/it, loss=nan, v_num=0, val/rec_loss_epoch=1.100, val/aeloss_epoch=1.140, train/aeloss_step=nan.0]
Previous Epoch counts: [530, 0, 1, 0, 0, 0, 11, 45, 212, 1, 0, 49, 5, 0, 1, 0, 0, 0, 0, 4, 1, 48, 0, 17, 5, 201, 13, 5, 38, 0, 1, 287, 1370, 6, 3, 0, 0, 1, 0, 1, 58, 1, 3, 4, 228, 123, 0, 0, 15, 0, 0, 6
, 0, 0, 36, 39, 36, 1, 7, 0, 0, 4, 38, 3, 0, 1, 62, 147, 5, 0, 3, 9, 8, 0, 13, 80, 33, 40, 0, 20, 0, 104, 26, 0, 4, 14, 1, 0, 0, 129, 0, 0, 2, 4, 7, 0, 1, 1, 0, 0, 28, 33, 2, 83, 0, 0, 43, 4, 4, 0, 59,
11, 22, 17, 6, 0, 30, 219, 0, 6, 15, 4, 2, 0, 0, 2, 0, 8]
Epoch 1: 51%|▌| 78/154 [01:08<01:06, 1.15it/s, loss=nan, v_num=0, val/rec_loss_epoch=nan.0, val/aeloss_epoch=nan.0, train/aeloss_step=nan.0, train/aeloss_epoch=nan.0, val/rec_loss_step=nan.0, val/aelo
Previous Epoch counts: [41870, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0]
Thank you very much!
The loss is going to 'nan' when i load the correct ckpt, do you have this problem? I trained on VAS dataset.
Originally posted by @jwliu-cc in #13 (comment)
The text was updated successfully, but these errors were encountered: