Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactor to support new architectures #261

Draft
wants to merge 53 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e19133d
deepspeed running
anas-awadalla Aug 25, 2023
870f20c
more progress
anas-awadalla Aug 26, 2023
f9162a0
added ds checkpointing
anas-awadalla Aug 26, 2023
ded3485
more progress
anas-awadalla Aug 30, 2023
3672042
mllm
Aug 30, 2023
99c350f
merge deepspeed
anas-awadalla Aug 30, 2023
2f634f0
rewrite src: add VLM, Kosmos, Flamingo
i-gao Sep 7, 2023
7261639
fix kosmos models
i-gao Sep 11, 2023
09977ba
cosmetic: num_params helper fn
i-gao Sep 11, 2023
6bb9071
revert to deepspeed branch code for train/
i-gao Sep 11, 2023
7984adb
add BLIP
i-gao Sep 12, 2023
7eab26a
minor train script fixes
i-gao Sep 12, 2023
aed0f21
fix vocab len issues
i-gao Sep 13, 2023
47c8e19
fixes
i-gao Sep 13, 2023
11ab894
big refactor of training code
i-gao Sep 15, 2023
cd4f3aa
many fixes + rewrite FSDP for torch nightly
i-gao Sep 16, 2023
74686a7
fixes
i-gao Sep 16, 2023
61f5a3d
fixes
i-gao Sep 16, 2023
ccfcb0f
run linter & fix gradient ckpting
i-gao Sep 16, 2023
303e707
no need to untie embeddings for fsdp
i-gao Sep 16, 2023
fc660e7
add in missing kwarg
i-gao Sep 16, 2023
be9a4dd
Merge branch deepspeed: eval code only
i-gao Sep 16, 2023
b0ff9a4
update eval code to match new src args
i-gao Sep 16, 2023
92bc4b7
update documentation and example scripts
i-gao Sep 16, 2023
60a82d7
fix deepspeed train script
anas-awadalla Sep 17, 2023
82d1c69
removed non default loss scale window
anas-awadalla Sep 17, 2023
4875822
init flamingo embeds new weights
anas-awadalla Sep 17, 2023
8f2f040
init flamingo embeds new weights
anas-awadalla Sep 17, 2023
beba4d2
Merge branch 'main' into mllm
anas-awadalla Sep 17, 2023
b81379f
fix mmc4 sim threshold arg
anas-awadalla Sep 17, 2023
f91c14a
add z-loss
anas-awadalla Sep 17, 2023
df96979
Merge pull request #262 from mlfoundations/add-z-loss
anas-awadalla Sep 17, 2023
bcc5a8f
Update eval README.md
i-gao Sep 17, 2023
770e653
have a default stdev for init
Sep 17, 2023
ef268be
Update run_train_deepspeed.sh
anas-awadalla Sep 17, 2023
da07e35
fix loss impl and model vocab size
Sep 17, 2023
3fcda82
Merge branch 'mllm' of https://github.com/mlfoundations/open_flamingo…
Sep 17, 2023
bcd2cf5
remove ds act checkpointing exception
Sep 18, 2023
9b1a764
fixes from PR review
i-gao Sep 19, 2023
866a780
Merge branch 'mllm' of github.com:mlfoundations/open_flamingo into mllm
i-gao Sep 19, 2023
5ad05c4
add weight/bias init to decouple linear
anas-awadalla Sep 20, 2023
939d460
Language stream changes (#264)
anas-awadalla Sep 21, 2023
ae76178
grad checkpointing + ds saving patch (we should find a cleaner solution)
anas-awadalla Sep 21, 2023
d29c8b8
Update run_train_deepspeed.sh
anas-awadalla Oct 18, 2023
b7af1d6
clearer parameter count logging
anas-awadalla Oct 18, 2023
43ac961
Fix model vocab size (now it is len of tokenizer)
anas-awadalla Oct 18, 2023
e7684b5
Update code example
anas-awadalla Oct 18, 2023
735a880
fix LR schedule
anas-awadalla Oct 23, 2023
496e656
fix var naming in load_deepspeed_checkpoint
anas-awadalla Oct 24, 2023
c5feb97
Update losses.py
anas-awadalla Nov 30, 2023
dbb1ad8
train_utils media token fix
Dec 2, 2023
fa6af69
remove unnecessary model unwrap lines
Dec 2, 2023
eb6b8aa
Merge pull request #283 from mlfoundations/media_token_fix
anas-awadalla Dec 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ OpenFlamingo is developed by:
The team is primarily from the University of Washington, Stanford, AI2, UCSB, and Google.

# Acknowledgments
This code is based on Lucidrains' [flamingo implementation](https://github.com/lucidrains/flamingo-pytorch) and David Hansmair's [flamingo-mini repo](https://github.com/dhansmair/flamingo-mini). Thank you for making your code public! We also thank the [OpenCLIP](https://github.com/mlfoundations/open_clip) team as we use their data loading code and take inspiration from their library design.
This code is based on Lucidrains' [flamingo implementation](https://github.com/lucidrains/flamingo-pytorch) and David Hansmair's [flamingo-mini repo](https://github.com/dhansmair/flamingo-mini). Thank you for making your code public! We also thank the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [OpenLM](https://github.com/mlfoundations/open_lm) team as we use their data loading/z-loss code and take inspiration from their library design.

We would also like to thank [Jean-Baptiste Alayrac](https://www.jbalayrac.com) and [Antoine Miech](https://antoine77340.github.io) for their advice, [Rohan Taori](https://www.rohantaori.com/), [Nicholas Schiefer](https://nicholasschiefer.com/), [Deep Ganguli](https://hai.stanford.edu/people/deep-ganguli), [Thomas Liao](https://thomasliao.com/), [Tatsunori Hashimoto](https://thashim.github.io/), and [Nicholas Carlini](https://nicholas.carlini.com/) for their help with assessing the safety risks of our release, and to [Stability AI](https://stability.ai) for providing us with compute resources to train these models.

Expand Down
Loading
Loading