About memory missing location information #23

LzhinFdu · 2024-05-10T09:44:45Z

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

Lazy3valuation · 2024-05-28T08:53:55Z

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.

LzhinFdu · 2024-05-28T09:02:21Z

I can also run through training. However, the current training results are not very good. I'm trying to train further

pengshuang · 2024-07-25T09:32:30Z

I also have same question, can you solve it ?

LzhinFdu · 2024-07-25T09:59:36Z

You can try to adjust the memory retrieval process to the end of 'apply_rotary_pos_emb' and compare the training performance. However, I did not try it further.

pengshuang · 2024-07-25T12:19:42Z

Thanks for your response.

lihua8848 · 2024-08-23T09:33:39Z

Can this retain location information?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About memory missing location information #23

About memory missing location information #23

LzhinFdu commented May 10, 2024 •

edited

Loading

Lazy3valuation commented May 28, 2024

LzhinFdu commented May 28, 2024

pengshuang commented Jul 25, 2024

LzhinFdu commented Jul 25, 2024

pengshuang commented Jul 25, 2024

lihua8848 commented Aug 23, 2024

About memory missing location information #23

About memory missing location information #23

Comments

LzhinFdu commented May 10, 2024 • edited Loading

Lazy3valuation commented May 28, 2024

LzhinFdu commented May 28, 2024

pengshuang commented Jul 25, 2024

LzhinFdu commented Jul 25, 2024

pengshuang commented Jul 25, 2024

lihua8848 commented Aug 23, 2024

LzhinFdu commented May 10, 2024 •

edited

Loading