Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage 2 training #100

Open
1 of 5 tasks
isamu-isozaki opened this issue Jul 23, 2023 · 0 comments
Open
1 of 5 tasks

Stage 2 training #100

isamu-isozaki opened this issue Jul 23, 2023 · 0 comments

Comments

@isamu-isozaki
Copy link
Collaborator

isamu-isozaki commented Jul 23, 2023

Once the pr for maxvit is done, I'm thinking of adding the below super res conditioning part

following part

Some steps needed I think are

  • In train_muse.py's prepare_inputs_and_labels function, interpolate pixel values to 256x256 and get tokens using f16 vqgan for low resolutions and 512x512 and f8 for high resolutions. We can use precomputed embeddings here
  • Then, we might want a SuperResTransformer class which takes as an attribute
  • the TransformerLayers for low resolution
  • the MaxVitTransformerLayers for high resolution
  • and projection layer and concatenating layer between the low res+text embeddings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant