Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights for linear replacement #1

Open
Jayce0625 opened this issue Nov 11, 2024 · 0 comments
Open

Weights for linear replacement #1

Jayce0625 opened this issue Nov 11, 2024 · 0 comments

Comments

@Jayce0625
Copy link

Jayce0625 commented Nov 11, 2024

Thanks for your great work! @MatthewMih .
I'm sorry I'm not a native English speaker, so please forgive me if there are mistakes.
I would like to know how to get the linear layer weights used to replace the Transformer layer which is highly linear? I used the approximation A obtained from the SVD matrix decomposition, but the result is very poor, so I would like to know how do we initialize the weights of the linear layer for the replacement?
Another question is that your papers include both w/ and w/o residual for the calculation of procrustes similarity, so which one should be used when discussing Linear replacement? During layer replacement, the entire Transformer layer and the residual connections of that layer are removed, so could it be argued that the “w/ residual” version is more appropriate (since it's being removed at the same time, it should be “w/ residuals” for the entire layer's linearity) ?
Thanks again for your work and look forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant