Replies: 1 comment
-
micro-batch-size is the sample numbers for every iteration. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am testing how is micro-batch-size influencing the throughput per GPU with a constant global-batch-size.
The result shows that as the micro-batch-size increases, the throughput per GPU(TFLOP/s/GPU) also increases.
I have done some test with a 400M transformer based model on 2 A40 GPUS, and only use data parallelism. Here are some training Arguments
With different test I only change the micro-batch-size , trained on 100 iterations with seq_len =1024 and global-batch-size =24 . Here are some result with different micro-batch-size
I print the log every 5 iterations and compute the averaged throughput per GPU.
For each Iteration , the total computational complexity is the same , but throughput per GPU increases as the micro-batch-size increases. I know that may related to the GPU cache load or arithmetic intensity but not quite clear. Can anyone provide some in-depth explanations?
Beta Was this translation helpful? Give feedback.
All reactions