Install Torch7

EXPLORING LEARNING STRATEGIES FOR TRAINING DEEP NEURAL NETWORKS USING MULTIPLE GPUS

Install Torch7

Please compile Torch7 by using following steps:

git clone https://github.com/torch/distro.git /home/Tools/torch_cuda-7.5 --recursive
Modify path_to_nvcc=/usr/local/cuda-7.5/bin/nvcc in the file /home/Tools/torch_cuda-7.5/install.sh
Make sure the path of /usr/local/cuda-7.5 in the file ~/.bashrc Then running
cd /home/Tools/torch_cuda-7.5 ; ./clean.sh
rm -rf ./install
remove the torch-activate entry from your shell start-up script (~/.bashrc or ~/.profile)
bash install-deps
./install.sh
./test.sh
set LD_LIBRARY_PATH & PATH export PATH="/home/Tools/torch_cuda-7.5:/home/Tools/torch_cuda-7.5/bin:/home/Tools/torch_cuda-7.5/install:/home/Tools/torch_cuda-7.5/install/bin:/home/Tools/torch_cuda-7.5/install/share/lua/5.1:$PATH" export LD_LIBRARY_PATH="/home/Tools/torch_cuda-7.5/lib:/home/Tools/torch_cuda-7.5/install/lib/lua/5.1:/home/Tools/torch_cuda-7.5/install/lib:$LD_LIBRARY_PATH" . /home/Tools/torch_cuda-7.5/install/bin/torch-activate

Installation Reference: http://torch.ch/docs/getting-started.html and https://github.com/torch/distro

Install Twitter Packages

Please install the related Twitter packages at Distributed learning in Torch (https://blog.twitter.com/2016/distributed-learning-in-torch) before running. First we git clone packages

git clone https://github.com/twitter/torch-distlearn
git clone https://github.com/twitter/torch-dataset
git clone https://github.com/twitter/torch-thrift
git clone https://github.com/twitter/torch-autograd
git clone https://github.com/twitter/torch-ipc

Then, we can go to each folder and run commands

luarocks install autograd
luarocks install thrift
luarocks install dataset
luarocks install ipc
luarocks install distlearn

Multi-GPUs

We can test run.sh and speech.lua by modifying input and ouput.

Findings

Before running the distributed learning, please make sure turn ACS off. Please run lspci -vvv and make sure you get "ACSCtl: SrcValid-" instead of "ACSCtl: SrcValid+" for PLX PCI-e switch. There are some information about GPU communications:

twitter-archive/torch-ipc#17

http://www.supermicro.com/support/faqs/faq.cfm?faq=20732

https://devtalk.nvidia.com/default/topic/883054/cuda-programming-and-performance/multi-gpu-peer-to-peer-access-failing-on-tesla-k80-/1

If you get "ACSCtl: SrcValid+" for the PCI bridge: PLX Technology, run "setpci -s bus#:slot#.func# f2a.w=0000" to disable ACSCtl on the PLX switch. Please run 3 steps:

lspci | grep -i plx , …check bus#:slot#.func#

sudo lspci -s 03:08.0 -vvvv | grep -i acs , …check ACSCtl: SrcValid+

sudo setpci -s 03:08.0 f2a.w=0000 , …make ACSCtl: SrcValid-

We can check the setting of GPU cards and their topo matrix using the command of "nvidia-smi topo --matrix".
The activation function of Relu is better than Tanh. But, ReLu may fall into 0% accuracy with the unsuitable learning rate. There is no such problem when using Tanh.
We may use the command of "nvidia-smi --loop=10 > nividia.log" to reduce the happening of "Segmentation fault" in torch-distlearn.

Reference

Microsoft: F. Seide et al., "1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs," Interspeech 2014.
Amazon: http://www.nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf
Dougal Maclaurin, David Duvenaud, Matt Johnson, "Autograd: Reverse-mode differentiation of native Python"
Twitter: https://blog.twitter.com/2016/distributed-learning-in-torch
Yu & Deng’s "Automatic Speech Recognition, A Deep Learning Approach"

UPDATE 16th March 2017, by Chien-Lin Huang https://sites.google.com/site/chiccoclhuang/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh
speech.lua		speech.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install Torch7

Install Twitter Packages

Multi-GPUs

Findings

Reference

About

Releases

Packages

Languages

chienlinhuang1116/torch-mgpu

Folders and files

Latest commit

History

Repository files navigation

Install Torch7

Install Twitter Packages

Multi-GPUs

Findings

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages