Skip to content

dev guide

tcpan edited this page Feb 16, 2015 · 1 revision

BLISS Developer's Guide

BLISS Library is implemented using C++ with MPI and OpenMP. It uses a number of C++11 features, and makes heavy use of STL classes such as container and iterators.

This guide is meant to provide a high level overview of the design features of the BLISS library, along with some illustrating examples.

Principles

At BLISS' core are bioinformatics algorithms that are designed to be efficient and scalable (see Algorithms ). Implementation of these algorithms, however, also require careful thought and appropriate designs.

BLISS Library has a goal of using C++ language features that provide high performance, low memory foot print, and extensibility and flexibility. This leads to a few principles in the design and implementation of the library:

  1. Classe and functions are templated as appropriate.
  2. Minimize memory utilization as appropriate
  3. Maximize usability as appropriate.
  4. Leverage the Compiler and C++11 to produce efficient binary

Templated Classes and Functions

BLISS follows the fairly common practice of using templates to support different data types.

bliss::Kmer<21, DNA> kmer21;
bliss::Kmer<43, DNA16>  kmer43;
...

In addition, BLISS also makes heavy use of template parameters to support interchangeable functionalities.

bliss::iterator::transform_iterator<char*, bliss::ASCII2<DNA> >  DNAIterator;

bliss::iterator::transform_iterator<DNAIterator, bliss::ToComplement<DNA>> ComplementIterator;

This mechanism provides the flexibility and extensibility of the implementation.

Minimize Memory Footprint

BLISS minimizes memory utilization in 2 ways.

  1. Use Iterators rather than containers where possible
  2. Use move semantics of C++ 11 where possible.

The approach BLISS has taken is to minimize intermediate memory usage through Iterators that computes on the fly. The earlier example of transform_iterator allows one to chain together simple computations that does not require global information. The transform iterators are defined with functors as template arguments to impart different functionalities. Below is an example of Kmer generation from characters:

std::string input = "GATTTGGGGTTCAAAGCAGT"; 

using KmerType = bliss::Kmer<9, DNA>;

using BaseIterator = std::string::const_iterator;

using Decoder = bliss::ASCII2<DNA, typename BaseIterator::value_type>;
using BaseCharIterator = bliss::iterator::transform_iterator<BaseIterator, Decoder>;

BaseCharIterator charStart(input.cbegin(), Decoder());
BaseCharIterator charEnd  (input.cend(),   Decoder());

using KmerIterator = bliss::KmerGenerationIterator<BaseCharIterator, KmerType>;

KmerIterator start(charStart, true);
KmerIterator end(charEnd, false);

for (; start != end; ++start, ++i) {
  printf("Kmer %d is %s\n", i, bliss::utils::KmerUtils::toASCIIString(*start));
}

Use of Move semantics in BLISS translates to significant (although not yet complete) support for move constructor and move assignment operator. This allow the memory allocated for an object to be moved during an assignment, for example, to a vector. This mechanism lowers memory utilization as well as improve computational efficiency, as (some) memory allocations can be avoided.

Maximize Usability

The extensive use of templates can make code hard to read and debug. Two mechanisms have been employeed. The first is the use of c++ 11 templated using keyword to perform type aliasing. For example,

/// normal KmerGenerationIterator for generating kmers from a sequence of alphabet characters
template <class BaseIterator, class Kmer>
using KmerGenerationIterator = KmerGenerationIteratorBase<KmerSlidingWindow<BaseIterator, Kmer > >;

/// reverse KmerGenerationIterator for generating kmers from a sequence of alphabet characters.  can be used for reverse complements.
template <class BaseIterator, class Kmer>
using ReverseKmerGenerationIterator = KmerGenerationIteratorBase<ReverseKmerSlidingWindow<BaseIterator, Kmer > >;

The second mechanism improves the debugging process during code development. BLISS uses a modified version of gccfilter http://www.mixtion.org/gccfilter/ to post process g++ error messages when compiling, and drastically improves readability of the compiler output.

Binary efficiency

Aside from the compiler optimization, BLISS makes heavy use of constexpr values, functions, and expressions. constexpr can be evaluated at compile time to produce values that do not require run-time evaluation.

In addition, a significant amount of templated code are specialized based on template parameters. The specialization are often done via metaprogramming, where evaluations of type traits allow for conditional declarations of functions and classes.