Questions about using COCa to generate captions #797

ykj467422034 · 2024-01-16T11:02:47Z

I'm finetuning OpenCLIP on my own csv dataset. Then I output the check_point file, and then use the official code to generate captions. However, the generated captions are always being generated repeatedly. Is there anyone who can help me solve this problem?
Finetuning
python -m training.main \ --dataset-type "csv" \ --train-data "my-csv/coca_train.csv" \ --warmup 1000 \ --batch-size 32 \ --lr 1e-5 \ --wd 0.1 \ --epochs 1 \ --workers 3 \ --model "coca_ViT-L-14" \ --report-to "wandb" \ --coca-contrastive-loss-weight 0 \ --coca-caption-loss-weight 1 \ --log-every-n-steps 100
Test
`import open_clip
import torch
from PIL import Image

model, _, transform = open_clip.create_model_and_transforms(
model_name="coca_ViT-L-14",
pretrained="logs/check_point.pth"
)

im = Image.open("cat.jpg").convert("RGB")
im = transform(im).unsqueeze(0)

with torch.no_grad(), torch.cuda.amp.autocast():
generated = model.generate(im)

print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))
`
Result

As you can see, the captions generated by different pictures are the same.

The text was updated successfully, but these errors were encountered:

ykj467422034 · 2024-01-16T11:06:48Z

@gpucce @Thomas2419 @rwightman @gabrielilharco

gpucce · 2024-01-16T11:11:55Z

Hi, @ykj467422034 can you share a snippet of the code you are actually using?
From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

ykj467422034 · 2024-01-16T11:15:03Z

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

gpucce · 2024-01-16T11:16:41Z

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

ykj467422034 · 2024-01-16T11:19:50Z

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

No, I know your meanings.
There are 100 images and I generate it picture by picture.

gpucce · 2024-01-16T17:04:05Z

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

ykj467422034 · 2024-01-16T17:07:10Z

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

The former. repeat captions

gpucce · 2024-01-16T17:09:46Z

Mmmh not sure, I asked about the random tensor to see if the model generates the same caption also in that case, if that is so, maybe fine-tuning didn´t go well. Do you get a similar behaviour with the pretrained model?

ykj467422034 · 2024-01-16T17:12:57Z

嗯，不确定，我问了随机张量，看看模型是否也在这种情况下生成相同的标题，如果是这样，也许微调不顺利。在预训练模型中，您是否有类似的行为？
Haven't，I can try this latter. But, thank you very much

Thomas2419 · 2024-01-16T17:32:03Z

Hello, @ykj467422034 , I haven't check if the most recent update has fixed this issue so this suggestion might not work and in fact it might screw everything up so this is my warning to you, but assuming it hasn't I will refer you to issue #751 The problem was that after coca finetuning the model, all of it's predictions were all repetitions of the same word.

For example in the issue it was "turnpike turnpike turnpike turnpike parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway".

The solution I found to work for me as I described in issue #751 was to git pull the open_clip repository, and then edit my local files in open_clip/src/open_clip/coca_model.py the lines as exactly specified line per line in Pull Request #710 by gpucce, and then ran pip install -e .
in the repository's main directory to install it post edits. This completely fixed my problem and made training function as desired for me.

ykj467422034 · 2024-01-16T17:36:24Z

您好，我还没有检查最近的更新是否解决了这个问题，所以这个建议可能不起作用，实际上它可能会搞砸一切，所以这是我给你的警告，但假设它没有，我会向你推荐问题#751问题是，在古柯微调之后，模型预测都是同一个词的重复。

例如，在问题中，它是“收费公路收费公路收费公路公园大道

正如我在问题 #751 中描述的那样，我发现对我有用的解决方案是 git 拉取 open_clip 存储库，然后在 open_clip/src/open_clip/coca_model.py 中编辑我的本地文件，按照 gpucce 在拉取请求 #710 中每行精确指定的行，然后在存储库的主目录中运行以在编辑后安装它。这完全解决了我的问题，并使训练功能符合我的需要。pip install -e .

I edited it as you say,
but the repeat captions still exist

Thomas2419 · 2024-01-16T17:37:51Z

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

ykj467422034 · 2024-01-16T17:41:37Z

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

Do you mean open_clip repository or modified src files？

Thomas2419 · 2024-01-16T17:47:48Z

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

ykj467422034 · 2024-01-16T17:50:55Z

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

Fine. Maybe I can try the latest version once more. Thanks

gpucce · 2024-01-16T17:51:48Z

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

ykj467422034 · 2024-01-16T17:53:51Z

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

Sure, I will. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about using COCa to generate captions #797

Questions about using COCa to generate captions #797

ykj467422034 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024 •

edited

Loading

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024 •

edited

Loading

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

Questions about using COCa to generate captions #797

Questions about using COCa to generate captions #797

Comments

ykj467422034 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024 • edited Loading

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024 • edited Loading

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

Thomas2419 commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024

ykj467422034 commented Jan 16, 2024

gpucce commented Jan 16, 2024 •

edited

Loading

Thomas2419 commented Jan 16, 2024 •

edited

Loading