Skip to content

mailmahee/data-algorithms-book

 
 

Repository files navigation

Data Algorithms Book

This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of distributed MapReduce algrithms, which are implemented using

  • Java/MapReduce Hadoop 2.5.0
  • Java/Spark 1.1.0

Work in Progress...

Please note that this is a work in progress...

Data Algorithms Book Work In Progress

URL To Data Algorithms Book

Source Code

  • All source code, libraries, and build scripts are posted here
  • Shell scripts will be posted for running Spark and Mapreduce/Hadoop programs (soon!)

Software Used

Software Version
Java JDK7
Hadoop 2.5.0
Spark 1.1.0
Ant 1.9.4

Structure of Repository

Name Description
README.md The file you are reading now
README_lib.md Must read before you build with Ant
src Source files for MapReduce/Hadoop/Spark
lib Required jar files
build.xml The ant build script
dist The ant build's output directory
LICENSE License for using this repository
misc misc. files for this repository
setenv example of how to set your environment variables before building

Structure of src Directory

src directory

Also, each chapter has two sub folders:

org.dataalgorithms.chapNN.spark      (for Spark programs)
org.dataalgorithms.chapNN.mapreduce  (for Mapreduce/Hadoop programs)

How To Build using Apache's Ant

How To Build by Ant

Sample Builds by Ant

How To Run Java/Spark/Hadoop Programs

How To Run Python Programs

To run python programs just call them with spark-submit together with the arguments to the program:

Questions/Comments

Please send me an email: [email protected]

Thank you!

About

Source Code and Scripts for Data Algorithms Book

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 98.6%
  • Shell 1.3%
  • Python 0.1%