You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once the pr for maxvit is done, I'm thinking of adding the below super res conditioning part
Some steps needed I think are
In train_muse.py's prepare_inputs_and_labels function, interpolate pixel values to 256x256 and get tokens using f16 vqgan for low resolutions and 512x512 and f8 for high resolutions. We can use precomputed embeddings here
Then, we might want a SuperResTransformer class which takes as an attribute
the TransformerLayers for low resolution
the MaxVitTransformerLayers for high resolution
and projection layer and concatenating layer between the low res+text embeddings
The text was updated successfully, but these errors were encountered:
Once the pr for maxvit is done, I'm thinking of adding the below super res conditioning part
Some steps needed I think are
The text was updated successfully, but these errors were encountered: