Skip to content

fmarotta/fplyr

Repository files navigation

fplyr

cran version-ago cran logs cran logs last-day cran logs grand-total

This package combines the power of the data.table and iotools packages to efficiently read a large file block by block and, in the meantime, apply a user-specified function to each block of the file. The outputs can be collected into a list or printed to an output file.

A 'block' is defined as a set of contiguous lines that have the same value in the first field. Thus, this package is not intended for all large files, but rather its usefulness is limited to a particular type of files.

Examples

A typical file that can be processed with fplyr is as follows.

V1   V2  V3 V4
ID01 ABC Berlin 0.1
ID01 DEF London 0.5
ID01 GHI Rome   0.3
ID02 ABC Lisbon 0.2
ID02 DEF Berlin 0.6
ID02 LMN Prague 0.8
ID02 OPQ Dublin 0.7
ID03 DEF Lisbon -0.1
ID03 LMN Berlin 0.01
ID03 XYZ Prague 0.2

The first block consists of the first three lines, the second block of the next four lines, and the third and last block is made up of the last three lines. Suppose you want to compute the mean of the fourth column for each ID in the first field. If the file is small, you can use by(). But if the file is so big that it does not fit into the available memory, you can use one of the functions of this package. If the path to the above file is stored in the variable f, the following command returns a list where each element is the mean of the fourth column for a single block.

l <- flply(f, function(d) mean(d$V4))

Installation

  1. The package is on CRAN and it can be installed from there. Start R and enter:
install.packages("fplyr")
  1. The development version can be installed from GitHub with devtools (but do it at your own risk). Start R and enter:
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("fmarotta/fplyr", ref = "devel")
  1. To install an old release, browse to , and download the "tar.gz" of the required version; then from the command line enter:
R CMD INSTALL fplyr-x.y.z.tar.gz