-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About memory missing location information #23
Comments
From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400. |
I can also run through training. However, the current training results are not very good. I'm trying to train further |
I also have same question, can you solve it ? |
You can try to adjust the memory retrieval process to the end of 'apply_rotary_pos_emb' and compare the training performance. However, I did not try it further. |
Thanks for your response. |
Can this retain location information? |
I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?
The text was updated successfully, but these errors were encountered: