-
Notifications
You must be signed in to change notification settings - Fork 13
dev guide
BLISS Library is implemented using C++ with MPI and OpenMP. It uses a number of C++11 features, and makes heavy use of STL classes such as container and iterators.
This guide is meant to provide a high level overview of the design features of the BLISS library, along with some illustrating examples.
At BLISS' core are bioinformatics algorithms that are designed to be efficient and scalable (see Algorithms ). Implementation of these algorithms, however, also require careful thought and appropriate designs.
BLISS Library has a goal of using C++ language features that provide high performance, low memory foot print, and extensibility and flexibility. This leads to a few principles in the design and implementation of the library:
- Classe and functions are templated as appropriate.
- Minimize memory utilization as appropriate
- Maximize usability as appropriate.
- Leverage the Compiler and C++11 to produce efficient binary
BLISS follows the fairly common practice of using templates to support different data types.
bliss::Kmer<21, DNA> kmer21;
bliss::Kmer<43, DNA16> kmer43;
...
In addition, BLISS also makes heavy use of template parameters to support interchangeable functionalities.
bliss::iterator::transform_iterator<char*, bliss::ASCII2<DNA> > DNAIterator;
bliss::iterator::transform_iterator<DNAIterator, bliss::ToComplement<DNA>> ComplementIterator;
This mechanism provides the flexibility and extensibility of the implementation.
BLISS minimizes memory utilization in 2 ways.
- Use Iterators rather than containers where possible
- Use move semantics of C++ 11 where possible.
The approach BLISS has taken is to minimize intermediate memory usage through Iterators that computes on the fly. The earlier example of transform_iterator
allows one to chain together simple computations that does not require global information. The transform iterators are defined with functors as template arguments to impart different functionalities. Below is an example of Kmer generation from characters:
std::string input = "GATTTGGGGTTCAAAGCAGT";
using KmerType = bliss::Kmer<9, DNA>;
using BaseIterator = std::string::const_iterator;
using Decoder = bliss::ASCII2<DNA, typename BaseIterator::value_type>;
using BaseCharIterator = bliss::iterator::transform_iterator<BaseIterator, Decoder>;
BaseCharIterator charStart(input.cbegin(), Decoder());
BaseCharIterator charEnd (input.cend(), Decoder());
using KmerIterator = bliss::KmerGenerationIterator<BaseCharIterator, KmerType>;
KmerIterator start(charStart, true);
KmerIterator end(charEnd, false);
for (; start != end; ++start, ++i) {
printf("Kmer %d is %s\n", i, bliss::utils::KmerUtils::toASCIIString(*start));
}
Use of Move semantics in BLISS translates to significant (although not yet complete) support for move constructor and move assignment operator. This allow the memory allocated for an object to be moved during an assignment, for example, to a vector. This mechanism lowers memory utilization as well as improve computational efficiency, as (some) memory allocations can be avoided.
The extensive use of templates can make code hard to read and debug. Two mechanisms have been employeed. The first is the use of c++ 11 templated using
keyword to perform type aliasing. For example,
/// normal KmerGenerationIterator for generating kmers from a sequence of alphabet characters
template <class BaseIterator, class Kmer>
using KmerGenerationIterator = KmerGenerationIteratorBase<KmerSlidingWindow<BaseIterator, Kmer > >;
/// reverse KmerGenerationIterator for generating kmers from a sequence of alphabet characters. can be used for reverse complements.
template <class BaseIterator, class Kmer>
using ReverseKmerGenerationIterator = KmerGenerationIteratorBase<ReverseKmerSlidingWindow<BaseIterator, Kmer > >;
The second mechanism improves the debugging process during code development. BLISS uses a modified version of gccfilter http://www.mixtion.org/gccfilter/ to post process g++ error messages when compiling, and drastically improves readability of the compiler output.
Aside from the compiler optimization, BLISS makes heavy use of constexpr
values, functions, and expressions. constexpr
can be evaluated at compile time to produce values that do not require run-time evaluation.
In addition, a significant amount of templated code are specialized based on template parameters. The specialization are often done via metaprogramming, where evaluations of type traits allow for conditional declarations of functions and classes.