[QUESTION] how to control GPU memory layout for 70B LLM model? #1074
Unanswered
wangdaw2023
asked this question in
Q&A
Replies: 1 comment
-
Maybe try to set smaller TP larger PP (e.g. TP=4, PP=4 or TP=4, PP=8) for 70B case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am training 70B Megatron LLM on A800 with 32 nodes cluster. The cluster is composed of 32 nodes with 8 * A800 and 4 * RoCE 200Gb/s. I find 70B MFU 20% is quite lower than 32B model MFU 47%. Besides, I find some node GPU memory usage is 70GB, while other node memory usage is 50GB. I would like to tune memory usage to the same level to use bigger micro batch size to improve MFU. It involves to place/layout which LLM layer to which rank. Any document for this topic?
32B LLM, TP=8, PP=1, MFU=47%
70B LLM, TP=8, PP=2, MFU=20%
Beta Was this translation helpful? Give feedback.
All reactions