-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactor to support new architectures #261
base: main
Are you sure you want to change the base?
Conversation
Some other todos I want to add to this:
|
* fix padding side when generating * clean up language stream forward pass (less for looping) * expose BLIP model * fixes for forward pass without images * restore for looping
I have a keen interest in exploring the latest features. To that end, I've integrated the deepspeed-related code into the current main branch of Openflamingo, which includes functions like get_deepspeed_config(). During my testing, I observed that the code runs smoothly with the setting deepspeed_stage = 2 and exhibits significantly efficiency improvement compared to fsdp. However, when I attempted to configure it with deepspeed_stage = 3, an error was encountered during the execution of the loss backward propagation process:
Do you have any idea about this? Or have you encountered this problem when developing the new version. |
You said you integrated “deepspeed-related code into the current main branch of Openflamingo”. Have you tried using this branch as is? The integration is basically complete but we are doing more testing to be certain. I will also test out stage 3 again to make sure we haven’t missed anything. |
I did not directly run this branch, as I have developed my project based on the main branch. Therefore, I just copy the deepspeed-related code in this branch to my code. The error is very strange: 1) Stage 2 works, but stage 3 reports the error; 2) The error occurred while executing loss backward, but the backward process rarely reports errors; 3) Which tensor has a size 0 as reported. If you have no idea about this, I have to run my code with deepspeed stage 2. Thanks! |
I tried this branch, and it works well on the training part. I also tested the evaluation part of the branch "Merge wilds mllm". Unfortunately, there are some bugs. I reported two bugs in "#266". |
train_utils media token fix
New models
VLM
class. See documentation insrc/vlm.py
VLMWithCrossAttention
(dense xattn to fuse vision + language, Flamingo-style) vs.VLMWithLanguageStream
(insert vision tokens into the language stream, Kosmos-style)FSDP Updates
Training code refactor
train_one_epoch
now accepts a list of datasets and executes the same loss function on all of them. This permits us to decide the datasets to train on at runtime (e.g. just LAION) and makes adding in datasets more flexible. To train on a dataset, set the--{dataset_name}_shards
arg (e.g.--laion_shards
). If this is None, then we will not train on that dataset (i.e., skip LAION)train_one_epoch
also now accepts a loss function decided at runtime. Losses are found intrain/losses.py
. Currently, only next token prediction is implemented, but this allows us to work on adding contrastive-generative losses.train/distributed.py
in an attempt to streamlinetrain/train.py
Steps before merging
lang_model
instead oflang_encoder
; this will not play well with the released weights; we need to decide what to do about the embeddings).Steps after merging