Skip to content

0.5.0

Compare
Choose a tag to compare
@vkostyukov vkostyukov released this 05 Jan 06:44
· 74 commits to master since this release

It's been a while since the previous release. It took me almost an year to rethink and re-implement the significant part of la4j - operations on the sparse data. This is literally the most significant release of the library for a while. It finally solves the old performance/design issue: the la4j library is now taking the advantage of sparse data. Two major features allowed la4j to increase the performance up to 20x on sparse data:

  • Expendable operations allowed to select an algorithm implementation depending on arguments types: sparse vs. dense.
  • A lightweight and composable iterators allowed to develop easy to reason about algorithms on sparse data.

In order to perform an operation on a matrix or vector an apply method might be used. The following example calculates the inner product of the given vectors a and b.

Vector a = ???
Vector b = ???
double d = a.apply(LinearAlgebra.OO_PLACE_INNER_PRODUCT, b);

While the same result might be achieved by just:

Vector a = ???
Vector b = ???
double d = a.innerProduct(b);

Usually, you don't need to use operation explicitly unless you're extending the library with you own operation. The Matrix/Vector API provides a reasonable set of methods that forward to underlying operations.

Iterators provides a simple control abstraction for working with matrices and vectors. There are dramatically useful for sparse data. For example, while regular for-loop over the sparse vector's components takes O(n log n) time the following code requires O(n) time:

SparseVector a = ???
VectorIterator it = a.iterator();
while (it.hasNext()) {
  double x = it.next();
  int i = it.index();
  // do something with x and i
}

More importantly, there are bunch of sparse iterators in the Vector/Matrix API. For example, the following iterator iterates over non-zero elements of CRSMatrix:

CRSMatrix a = ???
MatrixIterator it = a.nonZeroRowMajorIterator();
while (it.hasNext()) {
  double x = it.next();
  int i = it.rowIndex();
  int j = it.columnIndex();
  // do something with x, i and j
}

The most cool thing about iterators is that they are composable! You can compose two iterators with ether orElse or andAlso combinators. The orElse combinators (i.e., orElseAdd, orElseSubtract) return an iterator that represents a union between two iterators, while the andAlso combinators (i.e., andAlsoMultiply, andAlsoDivide) return an iterator that represents an intersection between two iterators. Here is how we'd add two sparse vectors using non-zero iterators and their composition:

SparseVector a = ???
SparseVector b = ???
SparseVector c = ??? // result
VectorIterator these = a.nonZeroIterator();
VectorIterator those = b.nonZeroIterator();
VectorIterator both = these.orElseAdd(those);

while (both.hasNext()) {
  double x = both.next();
  int i = both.index();
  c.set(i, x);
}

Note that + (add) operation is used to resolve conflicts of union: if both vectors have non-zero components on the same index, the composed iterator will return their sum on both.next() or both.get().

There is another powerful feature of the iterators: there are writable. The set(double) method might be used to write the current iterator's element. The following example shows how to implement in-place scaling by 10 of the sparse matrix:

SparseMatrix a = ???
MatrixIterator it = a.nonZeroIterator();
while (it.hasNext()) {
  double x = it.next();
  it.set(x * 10);
}

The last thing you should know about iterators is that they are very efficient if used correctly. The rule is very simple: row-major sparse matrices (i.e., CRSMatrix) doesn't really like column-major iterators: columnMajorIterator(), nonZerorColumnMajorIterator(), iteratorOfColumn(int) and nonZeroIteratorOfColumn(int) and vise-versa. So, if you use column-major sparse matrix (i.e., CCSMatrix) do not read/write it in a row-major manner.

Iterators made the la4j library much more awesome. Within a super small overhead they provide a powerful and easy to reason about abstraction that forces the developers write efficient and clean sparse algorithms.

Finally, I'm trying to get rid of two boilerplate abstractions: org.la4j.factory.Factory and org.la4j.{matrix, vector}.source.Source. All of the them have been deprecated in the release: try to avoid the usage of deprecated API, more likely such methods will be removed in the next release.

Why get rid of factories? They are terrible. Operations allowed to chose the most efficient data representation for the out-of-place operations. The la4j library knows how to store your result in the most efficient way, so you should not be worried about it any more. For example, the multiplication of two sparse vectors is sparse vector while addition of sparse vector to dense vector will return dense vector and so on. Thus, instead of using factories for building new vectors/matrices you can use static factory methods like DenseMatrix.zero(int, int) or SparseMatrix.identity(int). I'm trying to get rid of other ways of doing things in la4j. There is should be the only way: the right way.

Now, you can convert matrices or vectors from sparse to dense representation using type-safe converter to. Note, that is the given factory (which is not org.la4j.factory.Factory but org.la4j.matrix.MatrixFactory or org.la4j.vector.VectorFactory) produces the same objects as an object being converted - no copying will be performed: just type casting:

Matrix a = CRSMatrix.identity(100);
RowMajorSparseMatrix b = a.toRowMajorSparseMatrix(); // no copying but type-safe casting

Matrix c = Basic2DMatrix.zero(100, 100);
DenseMatrix d = c.to(Matrices.BASIC_1D); // copying

This release is a huge step for the la4j project. I'm finally see the roadmap to the 1.0.0 release. The biggest feature in the first stable version should be in-pace operations, which are definitely might be done with iterators.