Skip to content

Latest commit

 

History

History
78 lines (62 loc) · 4.4 KB

README.md

File metadata and controls

78 lines (62 loc) · 4.4 KB

Distributed Query Engine

This project is comprised of two part. The first part is the project accoding to the Query System Layout with Zookeeper and the other part is Mesos Framework Part. The full explanation could be found in CERN Student record

This project is part of Toward real-time data query systems in HEP Paper

Query System Layout

query system layout

Folder Structure

├── client.py               # Zookeeper Client 
├── compute                 # Zookeeper Compute Node
│   ├── executor.py         # Executor Class of Compute Node
│   ├── slave.py            #
│   ├── zk_compute.py       # System Compute Node
├── dispatcher              #
│   ├── dispatcher_exec.py  # Executor of Dispatcher
│   ├── dispathcer.py       # System Dispatcher
├── docs                    # document folder
├── executor                
│   └── executor.py         # Mesos Executor
├── framework               
│   ├── framework.py        # Mesos Framework
│   └── job.py              # Mesos Job State Class
├── master.py               # Mesos Master 
├── storage                 
│   ├── db.py               # MongoDB Connector Class
├── test_results.py         # Testing Result for distributed file
├── utilities               
└── zookeeper               #
    ├── job.py              # Job Status Class for ZK 
    └── zk_queue.py         # Zookeeper Queue Class

Prerequisite

  • Python 2.7+
  • Zookeeper 3.4.10
  • MongoDB 3.4
  • Python Package
    • Pymongo – For connecting to MongoDB
    • Kazoo – For connecting to Zookeeper

Run the project

Run Query System

  • Inside the root directory of the project there is a client.py file which is the file that test the system by producing job and put inside the queue of Dispatcher and then start the rest of the system

  • The command here is python client.py [enable_create] [delete_all_collection] [enable_dispatcher] [enable_slave] [enable_watcher] The flag can be represented by 0(disable) and 1(enable). For example python client.py 1 1 1 1 1

    • [Enable_create] means to create path in zookeeper /owned, /unowned, /done.
    • The second flag is delete_all_collection flag. It is used to flush mongodb database to test the system from nothing.
    • enable_dispatcher is to enable dispatcher which is the class responsible for assigning subtasks.
    • enable_slave will let the compute node start running and calculating the job.
    • And watcher which is the one returning done job enabled by specifying enable_watcher flag.
  • The running log is saved in the running.log file. It can be used to debug behavior of the system and to check the result of the data.

  • To create /owned, /unowned, /done path, the exist path needed to be removed first. To remove the node, it is necessary to remove all children nodes. It can be done in the /zookeeper/bin by typing ./zkCli.sh rmr /done && ./zkCli.sh rmr /owned && ./zkCli.sh rmr /unowned

  • It is possible to connect to zookeeper while running the system. To connect to the zookeeper, run ./zkCli.sh -server 127.0.0.1:2181 inside zookeeper/bin

  • The number of subtasks can be changed by changing number_of_quries in client.py

Run Test

With the result from running.log, it is possible to check the result by checking from the log file. To check the result, run python test_results.py command and see the result. The sum of all data return should be equal to the amount of incoming request which we specified in the range.

System Documents

Kazoo-Zookeeper Basic Usage

Kazoo API Usage in the Project

Zookeeper and Kazoo References