rmr2

This package enable writing R programs for the Hadoop Mapreduce system, that is parallel distributed programs for the most popular big data platform.

Requirements

A Hadoop custer running any recent Hadoop distribution (CDH3 and higher, Apache Hadoop 2.2.0 and higher, HDP2 and higher)
R installed on each node of the cluster (3.0 or higher)

Installation

rmr2 itself needs to be installed on each node with the following expression:

install.packages("rmr2", repos = c("http://archive.rzilla.org", unlist(options("repos"))))

Configuration

rmr2 needs two env variables to be set (only on the master node): HADOOP_CMD and HADOOP_STREAMING. HADOOP_CMD should contain the main hadoop command whereas HADOOP_JAR should be set to the streaming jar, named something like hadoop-streaming*.jar, where the exact naming depends on version and distribution. Optionally, HDFS_CMD can be set to help rmr locate the hdfs command executable, which at this time only spares a few warnings, but may be mandatory in the future.

Version

The current version is 3.3.1 .

Name		Name	Last commit message	Last commit date
Latest commit History 1,150 Commits
build		build
docs		docs
hadoopy_hbase		hadoopy_hbase
pkg		pkg
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rmr2

Requirements

Installation

Configuration

Version

About

Releases

Packages

Languages

rzilla/rmr2

Folders and files

Latest commit

History

Repository files navigation

rmr2

Requirements

Installation

Configuration

Version

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages