-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: UCC integration #575
Comments
@nirandaperera I did the first step too. Building along with MPI didn't work. |
They've now added a comprehensive example, which we can directly use. |
fyi, torch_ucc was moved to another repo https://github.com/facebookresearch/torch_ucc |
Thanks a lot for the pointer @Sergei-Lebedev |
@Sergei-Lebedev <https://github.com/Sergei-Lebedev>, on a side note,
dist.new_group() with UCC might also benefit from this PR I've Pass group
ranks and options to third party distributed backends by esaliya · Pull
Request #73164 · pytorch/pytorch (github.com)
<pytorch/pytorch#73164>
This is to fix the missing subranks info from distributed_c10d.py to
PyTorch 3rd party distributed backends. UCC is the only other 3rd party
distributed backend I've seen so far, so if you can give some feedback,
that'll be great.
Saliya
…On Tue, Mar 15, 2022 at 12:48 AM Vibhatha Lakmal Abeykoon < ***@***.***> wrote:
Thanks a lot for the pointer @Sergei-Lebedev
<https://github.com/Sergei-Lebedev>
—
Reply to this email directly, view it on GitHub
<#575 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGC3L5RAZQY5F2VRES4Q43VAA6FPANCNFSM5QD34HHQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Saliya Ekanayake, Ph.D
Cloud Accelerated Systems & Technologies (CAST)
Microsoft
|
Hi @esaliya, subranks info might me useful, but I think it can be reconstructed using prefix store without adding any additional options to PG constructor. In UCC we don't need this info because UCC team allgather is used instead. import os
import torch
import torch.distributed as dist
import torch_ucc
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12345'
os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
dist.init_process_group('gloo')
sg = dist.new_group(ranks=[0, 1], backend='ucc')
if dist.get_rank() in [0, 1]:
sg.barrier()
dist.barrier() |
@kaiyingshan following are the steps that needs to be done to build UCC.
If you are running ucc_example.cpp locally, make sure to add conda libs and UCC libs to the |
It seems like it fails to build nondeterministically on my computer, maybe it's because I'm using wsl.. I'll try to figure out the cause |
References:
Note: UCX requires 1.11<= (current conda is 1.12 which works!)
Roadmap:
Build UCC and UCX locally - Tested with conda ucx installation
Incorporate UCC to Cylon as a part of UCX build. Allow UCC libs and headers to be provided externally (
-DCYLON_UCX=ON -DUCC_PREFIX=<ucc install path>
)Add UCC to current UCX context (currently MPI is used to spawn processes. This would be an easy entry point) - Use
torch-ucc
as a reference impl(these were resolved by Ucc integration #591)
Use a single UCP context for UCX and UCC #595
Bootstrap UCX processes outside of MPI #594
The text was updated successfully, but these errors were encountered: