GitHub

Description

This repo contains all code related to my Masters thesis involving distributed training of neural networks.

Link to Thesis on arXiv: https://arxiv.org/abs/1812.02407

Installation:

Using virtualenv is highly recommend. Developed using Python 3.6.5 and PyTorch 0.3.1

Assuming you're using Python 3.6.5 inside a virtual environment with pip available, you will first need to install PyTorch

On a mac (no CUDA), use:

$ pip install http://download.pytorch.org/whl/torch-0.3.1-cp36-cp36m-macosx_10_7_x86_64.whl

On Linux (with CUDA 8), use:

$ pip install http://download.pytorch.org/whl/cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl

(or pick a different binary with pytorch==0.3.1: https://pytorch.org/previous-versions/)

And then ...

$ pip install -r requirements.txt

Execution

$ python main.py

will run the default "experiment" - the Iris classification task, with default configs - 4 workers, data split evenly, a 3-layer neural network with 3-way Softmax classifier, run over 3 epochs using all-reduce.

This is tiny enough that it should run on any modern computer in seconds, and serves well as a Hello World.

Run the following for more options:

$ python main.py --help

The most useful arguments would be:

--experiment {iris,mnist,cifar10}
--agg-method {local,noComm,gradAllReduce,elasticGossip,gossipingSgd}
                      aggregation method used to aggregate gradients or
                      params across all workers during training
--agg-period AGG_PERIOD
                      if applicable, the period at which an aggregation
                      occurs
--agg-prob AGG_PROB   if applicable, the probability with which agg occurs
--elastic-alpha ELASTIC_ALPHA
                      "moving rate" for elastic gossip

Logging and output

Logs are Bunyan formatted, so you will need the Bunyan CLI tool to view them.

$ npm install bunyan

Logs are stored at ./logs/<exp-id> where <exp-id> can be specified using the --exp-id argument, this defaults to ./logs/unspecified/.

Logs are Bunynan-formatted, which means they're also JSON formatted. If you'd simply like to read them:

$ cat <logs> | bunyan -o short -l INFO

The logs folder has one log file for each worker, identified by rank, and a metadata.json, which is a dump of the command-line arguments including the defaults.

$ cat ./logs/unspecified/metadata.json | jq

Tests

$ python -m pytest

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
logs		logs
nb		nb
src		src
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Installation:

Execution

Logging and output

Tests

About

Releases

Packages

Languages

License

sidps/dist_training

Folders and files

Latest commit

History

Repository files navigation

Description

Installation:

Execution

Logging and output

Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages