This repository has been archived by the owner on Jul 1, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This diff adds two benchmarks that perform an allreduce operation using internally a technique inspired from Amazon's Herring paper. The two versions, one for TCP and one for InfiniBand interconnect, operate similarly: they spawn a set of clients (one per device, with multiple devices per machine, and multiple machines), then each allreduce call first performs a NCCL reduce or reduce_scatter step within each machine, then a shared send to a set of servers (one per machine, or one per device), then these servers perform the aggregation, and send back the result to the clients, which then do a final NCCL broadcast or all_gather step. Multiple allreduce calls, one per "bucket", are launched in parallel during each "epoch". There's also a baseline script that does these allreduces simply using NCCL.
Since the benchmarks depend on PyTorch, they are a bit "awkward" to build and install from within the TensorPipe repo. So, instead of integrating them inside our CMake system, I opted to build them as a separate PyTorch extension, hence giving them a setup.py file. Thus the installation steps are to first install TensorPipe on its own, then NCCL, then PyTorch, and then the benchmarks.
Differential Revision: D30220559