This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of distributed MapReduce algrithms, which are implemented using
- Java/MapReduce Hadoop 2.5.0
- Java/Spark 1.1.0
Please note that this is a work in progress...
- Title: Data Algorithms
- Author: Mahmoud Parsian
- Publisher: O'Reilly Media
- All source code, libraries, and build scripts are posted here
- Shell scripts will be posted for running Spark and Mapreduce/Hadoop programs (soon!)
Software | Version |
---|---|
Java | JDK7 |
Hadoop | 2.5.0 |
Spark | 1.1.0 |
Ant | 1.9.4 |
Name | Description |
---|---|
README.md | The file you are reading now |
README_lib.md | Must read before you build with Ant |
src | Source files for MapReduce/Hadoop/Spark |
lib | Required jar files |
build.xml | The ant build script |
dist | The ant build's output directory |
LICENSE | License for using this repository |
misc | misc. files for this repository |
setenv | example of how to set your environment variables before building |
Also, each chapter has two sub folders:
org.dataalgorithms.chapNN.spark (for Spark programs)
org.dataalgorithms.chapNN.mapreduce (for Mapreduce/Hadoop programs)
- How To Run MapReduce/Hadoop Programs
- How To Run Java/Spark Programs in YARN
- How To Run Java/Spark Programs in Spark Cluster
To run python programs just call them with spark-submit
together with the arguments to the program:
Please send me an email: [email protected]
Thank you!