Precision used in multi head attention layer #37

kimdaeun00 · 2024-11-26T11:16:35Z

Hi. Thanks for sharing your work!

I have a question for the precision used in multi-head attention layer.
In the current code(Flux), it seems that the activations are not casted to FP16 before the attention layer(mha_forward). Do other models, like PixArt-Sigma, use cast_fp16 in their mha layers?

lmxyy · 2025-01-16T21:59:04Z

The activations in FLUX are in BF16 precision. We do not use cast_fp16 in other models.

lmxyy added the question Further information is requested label Jan 16, 2025

lmxyy closed this as completed Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision used in multi head attention layer #37

Precision used in multi head attention layer #37

kimdaeun00 commented Nov 26, 2024

lmxyy commented Jan 16, 2025

Precision used in multi head attention layer #37

Precision used in multi head attention layer #37

Comments

kimdaeun00 commented Nov 26, 2024

lmxyy commented Jan 16, 2025