-
Notifications
You must be signed in to change notification settings - Fork 109
GSOС 2014
This page presents a list of ideas for a Google Summer of Code 2014 program. There are two major projects I would like to suggest for students: Vector/Matrix Engine and Microbenchmarking.
All the matrix/vector operations are currently implemented in the AbstractMatrix
class on top of primitive get
/set
interface operations. This gives a reusable and extendable code but ruins the overall performance. The point is that all the matrix/vector operations should rely on their types. For example, sparse matrix multiplication should use a completely different algorithm then dense matrix multiplication. This can't be done inside AbstractMatrix
(and even inside a concrete implementation, i.e. Basic2DMatrix
), since it breaks all the ideas behind OOP and gives us a set of unmaintainable if-the-else code.
The idea behind this project is that all computational logic should be placed into a special entity called engine. An engine knows everything about data specification (CRS, CCS, 2D, 1D) and able to perform an efficient computation on a concrete type. It should cover all the matrix/vector operations (operations that defined in AbstractMatrix
and AbstractVector
class). Thus, abstract classes should simply route all the operation requests to the corresponding engine. This will allow to keep the API unchanged.
The next step might be to extend an engine with type-based linear system solvers, matrix inverters, etc.
An engine thing will also allow to easily implement another important type of operations - in-place operations. Thus, might involve the new implementation for the engine abstraction. Both engine should be available in la4j: in-place engine and out-of-place engine (the default). The API may look as follows.
Matrix a = new CRSMatrix(...);
Matrix b = new Basic2DMatrix(...);
Matrix c = a.multiply(b); // uses a default out-of-place engine
Matrix d = a.multiply(b, LinearAlgebra.IN_PLACE); // uses an in-place engine
This project involves a strong design trade-off: we dispose an internal data structure format in order to get performance gain. And the aim here is provide a robust and reusable design/implementation for such concept.
There is an awesome tool called JMH (Java Microbenchmark Harness) that allows to maintain a robust set of microbenhmarks for the various components of a Java project. A microbenhmarking allows to establish a regular tracking of the project's performance, which is especially important for math libraries like la4j. Having a set of microbenchmarks will allow the project's maintainers both to carefully prevent any kind of performance regressions and to ship a high quality product to the users.
This project involves three stages:
- Define a representative set of project's KPI, which should cover the typical user's scenarious (i.e., 'matrix multiplication' or even more general 'solving a system of linear equations').
- Understand the Java Microbenchmarking techniques by revising the JMH's use cases and the best practices. A solid set of links to start with are collected in this SO question.
- Write a JMH-powered micro benchmarks for la4j and put them into the
test
subdirectory.
Mentor: @vkostyukov