Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PairG header-only library #1

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Conversation

cartoonist
Copy link

This PR mainly refactors the build script and a little bit of the library itself in order to make it usable as a header-only library.

cartoonist and others added 7 commits October 13, 2020 18:47
The goal is to minimize the number of dependencies of PairG header-only library. PairG
ends up to be only dependent on Kokkos and kokkos-kernels libraries. This commit
decouples argument parsing module from header files which makes PairG library
independent from `clipp` library. It also decouples `PaSGAL` from the library by using a
template type for directed char graph.
The vg file included in the `tests` module has been accordingly updated.
These libraries recently support standalone CMake build system.
The function `get_max_result_nnz` has been renamed to `get_c_nnz` in commit
kokkos-kernels@39e8503.
- employ modern CMake and raise minimum required version of CMake to 3.13
- use `find_package` to resolve dependencies
  - use installed dependencies if there are already present on the system
- use `add_subdirectory` call or CMake `ExternalProject` module for bundled dependencies
  - change handling of Protobuf, Kokkos, kokkos-kernels, Catch2, htslib, and libvgio
- define target `libpairg` for PairG library
- define target `pairg` for PairG command-line tool
- avoid modifying `CMAKE_CXX_FLAGS` since it is set by CMake for different build types
  - set compiler standard flag through `target_compile_feature` for `libpairg` target
- different build type for `pairg` and `tests` targets
- rename `BUILD_TESTS` to `BUILD_TESTING` which is more conventional and set it off by
  default

**Note**:
This commit makes `libvgio` a **required** dependency which was bundled before. This is
because libvgio does not handle its dependencies properly (e.g. libhandlegraph).
Therefore, it cannot be simply bundled by calling `add_subdirectory`. Considering that
handling libvgio's dependencies should be done in its own package, it seems reasonable
to ask users to have it installed as a requirement. This commit, however, does not
remove `libvgio` submodule in case this change needs to be reverted to make it bundled
again. The required procedure for bundling VGio is included but commented out because of
aforementioned issue in VGio build script.
@cartoonist
Copy link
Author

I briefly explained the reason behind each change in its corresponding commit message.

@cjain7
Copy link
Member

cjain7 commented Oct 14, 2020

Thanks! I'm curious to know if you are using PairG in your project(s)?

@cartoonist
Copy link
Author

Yes, I am trying out PairG for distance estimation between two alignments.

@cjain7
Copy link
Member

cjain7 commented Oct 19, 2020

You are welcome to contribute to this repo directly. Just sent an invite.

In the last iteration of the while loop, the unnecessary matrix multiplication `A_copy` x
`A_copy` will be avoided by this change.
@cartoonist
Copy link
Author

cartoonist commented Mar 27, 2021

Thanks for the invite. Just an update: I also extended the PairG a little bit to be able to index the whole human genome graph. For distance constraints of length 100bp, the original PairG index matrix would take around 1.7TB which is infeasible to construct based on my computational resources. While I managed to compress it to a 27GB-index with peak memory of 200GB during construction. I will share more details in the near future.

I would be happy to make a branch and add my implementation in another PR. However, since I am writing my thesis right now, I am not sure when I get the time to make it.

@cjain7
Copy link
Member

cjain7 commented Mar 28, 2021

compress it to a 27GB-index with peak memory of 200GB during construction.

Great!

cartoonist and others added 8 commits April 8, 2021 19:49
- Use `GNUInstallDirs` module for installation directories
- Define build and install interfaces for pairg target properties
The CMake build script no longer install bundled dependencies unless it is explicitly
requested by using `USE_BUNDLED_<LIBRARY>` variables.
The template parameters for `spadd` have changed in the new version and
unnecessary template parameters have been dropped. To adjust with it, we
just let the compiler deduce the parameters.
Since the function definitions might appear across multiple translation
units, they should be defined inline.
This commit also changes CMake minimum required version to 3.16.
All types; i.e. ordinal type (`lno_t`), device type, and scalar type can
now be provided as a template parameter. The class is defaulted to the
old hard-coded types.
All interface functions accept a template parameter specifying a
specialisation of `matrixOps` template class. The default type for this
parameter is the old `matrixOps` type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants