Weights for linear replacement #1

Jayce0625 · 2024-11-11T07:28:37Z

Thanks for your great work! @MatthewMih .
I'm sorry I'm not a native English speaker, so please forgive me if there are mistakes.
I would like to know how to get the linear layer weights used to replace the Transformer layer which is highly linear? I used the approximation A obtained from the SVD matrix decomposition, but the result is very poor, so I would like to know how do we initialize the weights of the linear layer for the replacement?
Another question is that your papers include both w/ and w/o residual for the calculation of procrustes similarity, so which one should be used when discussing Linear replacement? During layer replacement, the entire Transformer layer and the residual connections of that layer are removed, so could it be argued that the “w/ residual” version is more appropriate (since it's being removed at the same time, it should be “w/ residuals” for the entire layer's linearity) ?
Thanks again for your work and look forward to your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights for linear replacement #1

Weights for linear replacement #1

Jayce0625 commented Nov 11, 2024 •

edited

Loading

Weights for linear replacement #1

Weights for linear replacement #1

Comments

Jayce0625 commented Nov 11, 2024 • edited Loading

Jayce0625 commented Nov 11, 2024 •

edited

Loading