The Variant Store is a Java library that wraps the complexity of processing and handling large number of variants, exposing an interface for other applications to query the collected content of the VCF files.
This repo provides two projects:
./variant-store
-- the actual Variant Store./integration
-- the PhenoTips - Variant Store integration
- Exomiser is used for variant harmfulness.
- ExAC allele frequencies are provided by Exomiser.
mvn install
VariantStore vs = new VariantStore(
new Exomiser6TSVManager(),
new SolrController()
);
vs.init(Paths.get("/where/to/store/data/");
vs.addIndividual("IndividualId", true, Path.get("/path/to/variant/file"));
vs.removeIndividual("IndividualId");
The input-file handling and the variant storage are decoupled. This allows us to support multiple file types, and allow for the possibility of switching out the underlying database.
Internally, we use ga4gh-style representation of variants instead of VCF. Some differences include 0-based variant
start
instead of 1-based position
. The input and query interfaces, however, are VCF-style, and the conversion is
done before accessing the database.
Queries return GAVariants
, which are objects generated from the ga4gh schemas.
org.phenotips.variantstore.shared.VariantUtils
provides utilities for working with the objects.
- Provide a self-contained abstraction for dealing with genomic variants.
- Automatic deploy from a single jar with no manual actions by the user.
- Fast inserts and queries on single-node and multi-node installations
- Be flexible w.r.t. input file types and storage backends.
- Configure
VariantStore
with the desired input manager and DB. VariantStore.init(path)
- Input manager makes it's folder inside of
path
- DB unpacks the bundled resources in the jar into
path
This is the primary use-case for the Variant Store. This flow is used by PhenoTips' patient-network;
- VCFs are processed into TSVs by Exomiser externally (not handled by the Variant Store)
- Exomiser output TSV files are passed to the Variant Store
- The Variant Store
- Passes the TSV to the
TSVManager
- the
SolrController
spins up a task to add the individual - the
AddIndividualTask
- Passes the TSV to the
Solr can be scaled to multiple nodes using SolrCloud. The setup and deployment of a SolrCloud cluster is outside the scope of this project.
Assuming you have a cluster up and running, you can add a new DatabaseController
, following SolrController
as a guidance.
The new contoller would configure SolrJ to connect to the SolrCloud zookeeper instance. See the
Solr docs for more info.
- auto-detect the desired input manager to use based on the file's file path.
- Integrate jannovar or exomiser as a pre-processing step for files.