-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Padding text inputs to TextTransformation
results in incorrect captions
#588
Comments
hi @dsikka, can I ask you which generation type are you using? and how are you padding? There should be fixed length argument that the generator can use, these are the arguments, open_clip/src/open_clip/coca_model.py Lines 169 to 185 in 67e5e5e
using |
Hi thanks for the quick reply @gpucce I am currently using beam_search and was referring to the input to the text model in the forward pass: open_clip/src/open_clip/coca_model.py Line 151 in 67e5e5e
The open_clip/src/open_clip/coca_model.py Line 138 in 67e5e5e
Possibly something like this, if we were to padd all inputs to length 15?
I was wondering how to do this correctly while also correctly updating the attn mask: open_clip/src/open_clip/transformer.py Line 604 in 67e5e5e
|
Hi, just wanted to follow-up on this? |
Hello,
I am trying to run the caption generation workflow and was wondering what I have to do if the inputs to the TextTransformer model are always padded to a fixed length? Padding the input with the
pad_token_id
results in nonsensical captions.How should the attn mask be updated in both the TextTransformer and the MultiModalDecoder? Currently, the input to the TextTransformer increases as the caption is generated but I'd like to pad the input to a fixed length.
Thanks.
@lucidrains @gpucce @iejMac
The text was updated successfully, but these errors were encountered: