You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, generation is done by recomputing every activation after a token is added to the prompt. Normally, one would want to cache the intermediate activations to avoid recomputing them every time. It doesn't compose as well with using the forward function, but that's precisely why a clean and simple implementation should be a part of minGPT. It's very surprising that this is not afforded by pytorch's native TransformerEncoder module either.
The text was updated successfully, but these errors were encountered:
Currently, generation is done by recomputing every activation after a token is added to the prompt. Normally, one would want to cache the intermediate activations to avoid recomputing them every time. It doesn't compose as well with using the forward function, but that's precisely why a clean and simple implementation should be a part of minGPT. It's very surprising that this is not afforded by pytorch's native TransformerEncoder module either.
The text was updated successfully, but these errors were encountered: