ZeRO-Offload : datatype of gradient sent from GPU to CPU #2983
Unanswered
taehyunzzz
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a question of ZeRO-Offload. The paper suggests that ZeRO-Offload is communication-efficient because data exchanged between CPU <-> GPU are of the type FP16. While looking at the code, sending gradients to CPU seemed like it is converting FP16 gradient to FP32 in GPU, and then moving it to FP32 buffer in CPU. The datatype transferred between CPU and GPU is FP32 instead of FP16? Did the paper suggest that master weights were FP16 as suggested in the code?
Beta Was this translation helpful? Give feedback.
All reactions