You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks you all for your amazing work!
I have a question regarding the used image encoder. Initializing OpenFlamingo with the demo code provided in the README.md:
Hey @anas-awadalla and co,
thanks you all for your amazing work!
I have a question regarding the used image encoder. Initializing OpenFlamingo with the demo code provided in the
README.md
:The ViT seems to be downloaded from here (
.../open_clip/pretrained.py
):On the huggingface model card of OpenFlamingo this openai's ViT model card is linked.
Loading both models and comparing them with the following function will return False:
Visual inspection of the models state dict, also seems to strengthen the impression that the two stated models are not the same:
(Maybe the models are the same and I am just not knowledged enough to see it)
Now, I would be interested which
ViT-L-14
was used during training of OpenFlamingo.Expected Behavior
Both models
m1
andm2
to be the same modelCurrent Behavior
m1
andm2
seem to be different.Steps to Reproduce
Environment
OS: Ubuntu 22.04.3 LTS
Python: 3.11.8
open-clip-torch==2.24.0
torch==2.2.0
torchvision==0.17.0
Edit
Using both ViTs to encode the same image results in different embeddings as well.
The text was updated successfully, but these errors were encountered: