Twister2 Release 0.2.1
Twister2 Release 0.2.1
Twister2 0.2.1 is a patch release of Twister2 where we improve its performance and bugs.
We have add Streaming windowing support as a new beta feature to this release.
You can download source code from Github
Major Features
This release includes the core components of realizing the above goals.
- Resource provisioning component to bring up and manage parallel workers in cluster environments
- Standalone
- Kubernetes
- Mesos
- Slurm
- Nomad
- Parallel and Distributed Operators in HPC and Cloud Environments
- Twister2:Net - a data level dataflow operator library for streaming and large scale batch analysis
- Harp - a BSP (Bulk Synchronous Processing) innovative collective framework for parallel applications and machine learning at message level
- OpenMPI (HPC Environments only) at message level
- Task System
- Task Graph
- Create dataflow graphs for streaming and batch analysis including iterative computations
- Task Scheduler - Schedule the task graph into cluster resources supporting different scheduling algorithms
- Datalocality Scheduling
- Roundrobin scheduling
- First fit scheduling
- Executor - Execution of task graph
- Batch executor
- Streaming executor
- Task Graph
- TSet for distributed data representation (Similar to Spark RDD, Flink DataSet and Heron Streamlet)
- Iterative computations
- Data caching
- APIs for streaming and batch applications
- Operator API
- Task Graph based API
- TSet API
- Support for storage systems
- HDFS
- Local file systems
- NFS for persistent storage
- Web UI for monitoring Twister2 Jobs
- Apache Storm Compatibility API
- Connected DataFlow (Experimental)
- Supports creation of multiple dataflow graphs executing in a single job
These features translates to running following types of applications natively with high performance.
- Streaming computations
- Data operations in batch mode
- Iterative computations
Examples
With this release we include several examples to demonstrate various features of Twister2.
- A Hello World example
- Communication examples - how to use communications for streaming and batch
- Task examples - how to create task graphs with different operators for streaming and batch
- K-Means
- Sorting of records
- Word count
- Iterative examples
- Harp example
- SVM
Road map
We have started working on our next major release that will connect the core components we have developed
into a full data analytics environment. In particular it will focus on providing APIs around the core
capabilities of Twister2 and integration of applications in a single dataflow.
Next Major Release (End of June 2019)
- Connected DataFlow
- Fault tolerance
- Supporting more API's including Beam
- More example applications
Beyond next release
- Python API
- Implementing core parts of Twister2 with C/C++ for high performance
- Direct use of RDMA
- SQL interface
- Native MPI support for cloud deployments
- More resource managers - Pilot Jobs, Yarn
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0