Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About memory missing location information #23

Open
LzhinFdu opened this issue May 10, 2024 · 6 comments
Open

About memory missing location information #23

LzhinFdu opened this issue May 10, 2024 · 6 comments

Comments

@LzhinFdu
Copy link

LzhinFdu commented May 10, 2024

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

@Lazy3valuation
Copy link

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.

@LzhinFdu
Copy link
Author

I can also run through training. However, the current training results are not very good. I'm trying to train further

@pengshuang
Copy link

I also have same question, can you solve it ?

@LzhinFdu
Copy link
Author

You can try to adjust the memory retrieval process to the end of 'apply_rotary_pos_emb' and compare the training performance. However, I did not try it further.

@pengshuang
Copy link

Thanks for your response.

@lihua8848
Copy link

Can this retain location information?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants