This project help you to setup UIMA DUCC cluster locally with Docker
Check points
- Container for DUCC (follow best practices for Docker build file)
- Data persistence for experiment results and logs (after DUCC agent removal)
- Support multi-user mode with duccling
- Script to spin up cluster with arbitrary configuration (docker-compose preferably or bash)
- Run simple UIMA job (from samples) which will be distributed across several agent nodes
- Configure 3 agent node pools (compute-optimized, memory-optimized, general-use).
- Cluster should support automatic detection of agent addition/removal
- Implement autoscaling based on currently running jobs (when there is starvation of memory shares - add more nodes; remove nodes when there are no jobs running)
- Propose testing solution for containers, write some tests for this infrastructure
Documentation
- Architecture high-level design
- User guides (how to bootstrap cluster, how to modify configuration, how to run jobs, how autoscaling works)
Attention! in this doc doents describes DUCC architecture, only for clustering solution. For more information about UIMA DUCC arch look at "UIMA DUCC book".
Diagram of solution architecture:
- head - master node in DUCC UIMA cluster
- agentN - agent node in DUCC UIMA cluster
- ./jobs/ - shared folder between head node and host node for jobs
- ./results/ - shared folder between head node and host node for jobs results
- :2222 - exposed SSH port to get access to head via SSH
- :42133, :42155 - exposed port to get access to DUCC cluster web interface
- run.sh - script for preparation of head and agent nodes. On agent node script also automaically (based on its hosts file, change by Docker --link option) determines head node, adding itself to head's hosts file and ducc.nodes file and start agent from head node.
- cluster_scale.sh - scipt for cluster auto scaling, for more information please look at Cluster autoscaling section of this guide
File list
# head node files
./ducc-head
# run script (CMD) for head
./ducc-head/run.sh
# Docker doesnt support symlincs for COPY/ADD so archives placed here. Initialy was used 'wget' but it's too slow for testing purposes
./ducc-head/uima-ducc-1.1.0-bin.tar.gz
# Public and private keys (should be the same for head and agent)
./ducc-head/id_rsa
./ducc-head/id_rsa.pub
# Dockerfile
./ducc-head/Dockerfile
# agent node files
./ducc-agent
# run script (CMD) for agent
./ducc-agent/run.sh
# see above
./ducc-agent/uima-ducc-1.1.0-bin.tar.gz
# Public and private keys (should be the same for head and agent)
./ducc-agent/id_rsa
./ducc-agent/id_rsa.pub
# Dockerfile
./ducc-agent/Dockerfile
# Resources for documentation
./res
./res/arch.png
./res/arch.graphml
# jobs and results folders (mounted to head container)
./jobs
./results
# Sample config for docker-compose
./docker-compose.yml
# Cluster auto-scale script
./cluster_scale.sh
- Install Docker - https://docs.docker.com/installation/#installation
- Install Docker Compose (not mandatory) - https://docs.docker.com/compose/install/#install-compose
Clone this repository to your PC:
git clone https://github.com/aleksey-hariton/uima-ducc-docker.git
And use simple Docker or Docker compose instructions
To bootstrap new DUCC cluster with pure Docker:
- Build ducc-head and ducc-agent images:
docker build -t ducc-head ducc-head/
docker build -t ducc-agent ducc-agent/
- Run one head server:
docker run -t -i -p 42133:42133 -p 42155:42155 -p 2222:22 -v $(pwd)/results/:/tmp/res/ -v $(pwd)/jobs/:/tmp/jobs/ -d --name head ducc-head
- Then run as much agent servers as you wish, all new agent nodes (agent1, agent2... etc.) will be added to cluster automatically:
docker run -t -i -d --name agent1 ducc-agent
- Go to http://localhost:42133/ to open web interface of your new UIMA DUCC cluster
If you want to add new agent nodes, just repeat step 3.
First of all ensure that you have installed docker-compose (check Prerequisites section of this doc).
For cluster spin-up just run two commands:
docker-compose build
docker-compose up -d
This command will setup one DUCC head node and one agent node (you will have 2 agent nodes, one already installed on head server)
To increase count of agent nodes up to 3, run folowing command:
docker-compose scale agent=3
OR
you can just change docker-compose.yml file to have pre-configured cluster setup:
...
agent:
links:
- head
build: ./ducc-agent/
agent1:
links:
- head
build: ./ducc-agent/
agent2:
links:
- head
build: ./ducc-agent/
and run docker-compose up -d.
You can place your jobs into ./jobs/ folder (/tmp/jobs/ in head container) or use example provided with DUCC distro. Login to head container and submit new job:
ssh -p 2222 -i res/id_rsa root@localhost
cd /home/ducc/apache-uima-ducc/bin/
./ducc_submit -f ../examples/simple/1.job --log_directory /tmp/res/ --working_directory /tmp/res/
You can check Jobs section of DUCC web interface or check Viz section for visualisation where your job placed.
Please note that results will be placed into /tmp/res/ folder of head container, which equal to local folder ./results/ of this repository.
Cluster auto scaling implemented with Docker compose and script cluster_scale.sh. Before you run, change in script values:
- THRESHOLD - cluster utilization in %, after which script will start new agent node
- SLEEP - time in seconds between status update
- DOCKER_COMPOSE - path to docker-compose binary or set to docker-compose if i in $PATH
./cluster_scale.sh
Each $SLEEP seconds (10 by default) script will show status:
11:51:58 Cluster utilization: 3.3%; Active agent nodes: 1; Active jobs: 0
And if any action needed it will show messages like this:
ACTION: Increasing count of agents to 1 + 1 and sleep for a minute
If more nodes required or:
ACTION: No active jobs for the moment, so shutdowning all agents
if no active jobs left and script killall agent nodes.
If you want use more secure service with DUCC, you can enable DUCC web interface authorization (based on container system users):
- Login to head node and change ducc.properties file and enable ducc.ws.login.enabled option:
ssh -p 2222 -i res/id_rsa root@localhost
cd /home/ducc/apache-uima-ducc/resources/
vi ducc.properties
# ducc.ws.login.enabled = false
ducc.ws.login.enabled = true
- Now create new user on head node and all agent nodes:
useradd -m newuser -s /bin/bash
passwd newuser
Now you can use Login link in web interface and all your jobs will be submited by newly created user.
For more information please visit "DUCC book"
To specify new nodepool change file resources/ducc.classes on head node. Insert you changes after default nodepool defenition.
Nodepool --default-- { }