Precision used in multi head attention layer #37

kimdaeun00 · 2024-11-26T11:16:35Z

Hi. Thanks for sharing your work!

I have a question for the precision used in multi-head attention layer.
In the current code(Flux), it seems that the activations are not casted to FP16 before the attention layer(mha_forward). Do other models, like PixArt-Sigma, use cast_fp16 in their mha layers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision used in multi head attention layer #37

Precision used in multi head attention layer #37

kimdaeun00 commented Nov 26, 2024

Precision used in multi head attention layer #37

Precision used in multi head attention layer #37

Comments

kimdaeun00 commented Nov 26, 2024