Many thanks to the instructions by Robert and Charles: found on here
For most up-to-date set-up instructions: Medium Blog Post
ray-project here
Follow instructions from Google here to install Cloud SDK for Ubuntu.
Create a new project on Google Cloud Console and start a Kubernetes Cluster in that project.
Run gcloud init
and follow the prompted instructions
Install kubectl
on your local machine using either gcloud components install kubectl
or (Instructions)
On your GCloud Console goto ->Project->Kubernetes Engine->Clusters and click on Connect
. Paste the code to command-line to give kubectl
access to your cluster.
Clone the repository: git clone https://github.com/jhpenger/ray-kubernetes.git
Edit build.sh
, head.yml
, worker.yml
: replace tutorial-218804
with your project-ID
Run:
bash build.sh
docker push <image-tag>
In the cloned repository:
kubectl create -f head.yml
Wait for the ray-head
pod to be fully running. You can check pods' status with kubectl get pods
. If your head pod crashes kubectl logs ray-head
to debug.
Obtain ray-head
's Public Key by either:
kubectl logs ray-head
(key will be near bottom of output)kubectl exec -it ray-head bash
; Thenmore ~/.ssh/id_rsa.pub
Edit worker.yml
: replace <PASTE-PUBKEY-HERE-ONELINE>
with ray-head
's Public Key
If you are in ray-head
, exit back to your local machine and run:
kubectl create -f worker.yml
I mounted a simple python script (modified from exercise04.ipynb
found here) to test the cluster.
First, get into your ray-head
with kubectl exec -it ray-head bash
. Then run:
python /ray-kubernetes/test_cluster.py $MY_POD_IP:6379
You can choose to define # of actors by passing in an additional parameter. (e.g. python /ray-kubernetes/test_cluster.py $MY_POD_IP:6379 8888
). Default is set to 136. #-of-actors
should be no more than # of CPU cores
in your cluster (not # of CPUs
)
Your expected run-time should be ~2.5
seconds, but might be slower due to reaching cluster's max CPU capacity.
Instructions setup for Koç Lab @ University of California Santa Barbara.
We are trying to utilize Ray cluster to do reinforcement learning on Gibson Enviroment.
Currently in the very early stages of exploring ray
and Gibson
. Would greately appreciate guidance from anyone with:
- experience in running reinforcement earning simulations on large clusters using
GCloud
orAWS
. - using Google's
Preemptible VM Instances
withray
to cut down costs. Specifically in dealing with what to do whenpre-emptible instance
restarts.
Contact me @ [email protected]