Kmerind
is a library in the Parallel Bioinformatics Library for Short Sequences project (ParBLiSS).
Kmerind provides k-mer indexing capability for biological sequence data.
Please take a look at our Wiki.
ParBLiSS is a C++ library for distributed and multi-core bioinformatics algorithms. It requires C++ 11 features and MPI (OpenMP not required). The library is implemented as a set of templated classes. As such, most of the code is in header form, and are incorporated into the user code via #include
.
K-merind provides basic parallel sequence file access and k-mer index construction and query. Currently, it supports indices for frequency, position, and quality of kmers from short reads and whole genomes, using FASTQ and FASTA formats.
Required:
- c++11 supporting compiler
--
g++
(version 4.8.1+ due to "decltype" and other c++11 features) or --icpc
(version 16+ due to constexpr functions and initializers) or --clang
(version 3.5+ - cmake generated make file has problems with prior versions. or 3.7+ if openmp is used) cmake
(version 2.8+)- an MPI implementation, one of the following
--
openmpi
(version 1.7+ due to use of MPI_IN_PLACE) --mpich2
(version 1.5 +) --mvapich
(tested with version 2.1.7) --intel mpi library
(poorly tested)
See http://en.cppreference.com/w/cpp/compiler_support
Optional libraries are:
boost_log
, boost_system
, boost_thread
, boost_program-options
These are only needed if you intend to turn on boost log engine.
Optional tools include:
ccmake
(for graphical cmake configuration)perl
, and perl packagesTerm::ANSIColor
,Getopt::ArgvFile
,Getopt::Long
,Regexp::Common
(for g++ error message formatting)
git clone https://github.com/ParBLiSS/kmerind.git
cd kmerind
git submodule init
git submodule update
mkdir kmerind-build
cd kmerind-build
cmake ../kmerind
alternatively, instead of cmake ../kmerind
, you can use
ccmake ../kmerind
The following are important parameters:
-
CMAKE_BUILD_TYPE
: defaults toRelease
. -
ENABLE_TESTING
:On
allowsBUILD_TEST_APPLICATIONS
to show, which enables building the test applications -
BUILD_EXAMPLE_APPLICATION
:On
allows applications in theexamples
directory to be built -
LOG_ENGINE
: chooses which log engine to use. -
LOGGER_VERBOSITY
: chooses the type of messages to prin. -
ENABLE_SANITIZER
: turns on g++'s address or thread sanitizer. UseSANITIZER_STYLE
to configure. This is for debugging -
ENABLE_STLFILT
: turns on g++ error message post processing to make them human readable. Control verbosity viaSTLFIL_VERBOSITY
It is highly recommended that ccmake
be used until you've become familiar with the available CMake options and how to specify them on the commandlinie.
make
Important for developers using Intel Compilers, please see the "Intel Compiler Specific Issues" section at the end of the document.
ctest -T Test
or
make test
make doc
Please see Wiki.
Cmake typically uses a out-of-source build. to generate eclipse compatible .project
and .cproject
files, supply
-G"Eclipse CDT4 - Unix Makefiles"
to cmake.
Recommend that ptp
, egit
, and cmake ed
also be installed.
With Intel C Compiler (icc) version 15, the following compilation error is observed:
Internal error: assertion failed at: "shared/cfe/edgcpfe/il.c", line 18295
While there is very little information to be found on the internet related to this error, we have theorized that this is a compiler bug related to auto type deduction in templated function instantiation. It appears that ICC is unable to auto deduce the data type and size of a statically sized array of the form
datatype x[len]
which as a function parameter is specified as
datatype (&x)[len]
with datatype and len being template parameters for the function.
This error appears only for bitgroup_ops.hpp. Attempts to replicate the error in a separate test code was not successful. The workaround is to fully specify the template parameters for the function so to avoid automatic type deduction in this case.
It is not clear if other function parameter forms also cause this error.