Replies: 2 comments 2 replies
-
Hey @mikesol! Using the Sharding APIs + |
Beta Was this translation helpful? Give feedback.
-
Thanks! Would it make a sense to add a brief note to that effect on the Parallel training page? Something that briefly describes the difference and displays a grid showing what hardware supports what convention? For example, earlier today, I deployed a model using a mesh on 8 gpus & it worked just fine. But I tried the same on a v2-32 TPU and it didn't work. So I'm guessing that the |
Beta Was this translation helpful? Give feedback.
-
Hi all!
I've been using TPUs for flax training and it's been working quite nicely with
@pmap
. I'm now switching to a multi-gpu rig and I'm wondering if the setup will work the same? I've had to manually specify:which is fine but I'm guessing that will only work on one core and then I'd need to launch more processes to get more parallelism?
Also, I've read about sharding and I'm not quite sure what the current recommendation is for using ensembling vs sharding. It seems like there's some conceptual overlap there as one can shard over the batch dimension?
Thanks in advance for any tips about how folks usually set these things up on a multi-gpu environment!
Beta Was this translation helpful? Give feedback.
All reactions