WIP: Preventing the loss from being computed when the input token is EOS Token #878

ShashankMosaicML · 2024-01-17T04:06:07Z

The model should not be trained to predict the word after the eos_token, because it comes from a different sequence. This PR implements this logic.

TODO: Experimental verification.

Pulling the latest commits from main fork

Pulling from the main repo

Pulling from mosaicml/llm-foundry main

Merging from mosaic main

Pulling from mosaic main

Pulling from mosaic main.

samhavens · 2024-01-17T19:02:37Z

I think having this option is good, some users almost certainly want it.

However, I think this should be optional, as I am not convinced it shouldn't learn to predict the token after EOS. I'd expect the model to learn that after EOS (if sequences are joined randomly) it can disregard all context and pick from the distribution of tokens which begin sequences. This is a different distribution than raw unigram frequencies, which are the probabilities it should use when picking a token not conditioned on EOS.

Then, if sequences are not joined randomly, as in that TSP NN method, we definitely want to compute loss.

ShashankMosaicML · 2024-01-18T22:48:16Z

Then, if sequences are not joined randomly, as in that TSP NN method, we definitely want to compute loss.

Thanks for your comment! Yes, what you said makes sense. This is still very much a work in progress, and I just wanted to run some experimental tests initially to sanity check.
Also, this is mainly for the case where we do sequence id based masking. In that case, the eos token is still a part of the previous sequence, but its target is the first word of the next sequence.

vchiley · 2024-01-18T23:07:58Z

@samhavens should we also add the option to not predict BOS (assuming the previous tok is the end of the previous seq).

samhavens · 2024-01-19T00:01:37Z

@vchiley for models which have both EOS and BOS, are you saying don't learn that BOS comes after EOS? it isn't worth learning, true, but also... we'll always stop generating at EOS so it wouldn't matter... or am I misunderstanding

samhavens · 2024-01-19T01:02:59Z

as discussed on Slack, I think that:

EOS is effectively a BOS token, and so we want P(t|EOS) to be different than P(t), so we don't want to mask this loss
however, when doing seq id masking, we currently mask EOS for every token other than the first, so we learn P(t_0|EOS), P(t_1|t_0), P(t_2|t_0, t_1), ...
So @ShashankMosaicML will confirm this and if it is happening, shift the mask so that EOS is visible after t_0

ShashankMosaicML and others added 20 commits October 9, 2023 10:27

Merge pull request #1 from mosaicml/main

04dd334

Pulling the latest commits from main fork

Merge pull request #8 from mosaicml/main

87b2fdc

Pulling from the main repo

Merge pull request #12 from mosaicml/main

c9a42e4

Pulling from mosaicml/llm-foundry main

Merge branch 'mosaicml:main' into main

ddea9ee

Merge pull request #13 from mosaicml/main

0bcd8ee

Merging from mosaic main

Merge pull request #14 from mosaicml/main

f209b58

Pulling from mosaic main

Merge pull request #15 from mosaicml/main

ec4378d

Pulling from mosaic main.

Merge branch 'mosaicml:main' into main

b436706

..

bcace03

Merge branch 'mosaicml:main' into main

cf4aa58

Merge branch 'mosaicml:main' into main

7c35ce8

..

0a8ebfb

..

6f18a33

Merge branch 'mosaicml:main' into main

f42d585

Merge branch 'mosaicml:main' into main

2f3f53c

..

77b975f

Merge branch 'mosaicml:main' into main

e28cfbe

Merge branch 'mosaicml:main' into main

800c6f8

Merge branch 'mosaicml:main' into main

922ef13

Merge branch 'mosaicml:main' into main

d36f5f7

ShashankMosaicML requested a review from vchiley January 17, 2024 04:06

..

ed667db

ShashankMosaicML changed the title ~~Preventing the loss from being computed when the input token is EOS Token~~ WIP: Preventing the loss from being computed when the input token is EOS Token Jan 17, 2024

vchiley requested review from alextrott16, samhavens and abhi-mosaic January 17, 2024 06:24

..

092539f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Preventing the loss from being computed when the input token is EOS Token #878

WIP: Preventing the loss from being computed when the input token is EOS Token #878

ShashankMosaicML commented Jan 17, 2024 •

edited

Loading

samhavens commented Jan 17, 2024

ShashankMosaicML commented Jan 18, 2024 •

edited

Loading

vchiley commented Jan 18, 2024

samhavens commented Jan 19, 2024

samhavens commented Jan 19, 2024

WIP: Preventing the loss from being computed when the input token is EOS Token #878

Are you sure you want to change the base?

WIP: Preventing the loss from being computed when the input token is EOS Token #878

Conversation

ShashankMosaicML commented Jan 17, 2024 • edited Loading

samhavens commented Jan 17, 2024

ShashankMosaicML commented Jan 18, 2024 • edited Loading

vchiley commented Jan 18, 2024

samhavens commented Jan 19, 2024

samhavens commented Jan 19, 2024

ShashankMosaicML commented Jan 17, 2024 •

edited

Loading

ShashankMosaicML commented Jan 18, 2024 •

edited

Loading