Release v2.0.0 · opentargets/gentropy

What's Changed

💥 Breaking

[l2g!] implement new training strategy splitting between EFO/gene pairs and with cross validation (#938) @Irene López Santiago
[L2GFeatureMatrix!] streamline feature matrix management (#745) @Irene López Santiago
[Orchestration!] drop airflow orchestration layer from gentropy (#758) @Szymon Szyszkowski

✨ Feature

adding GERP conservation score to variant annotation (#933) @Daniel Suveges
allow building package from tag (#930) @Szymon Szyszkowski
coalesce l2g fm and predictions (#934) @Szymon Szyszkowski
coalescing the datasets (#932) @Szymon Szyszkowski
[gold_standard] add traitFromSourceMappedId to schema (#924) @Irene López Santiago
changing studylocus validation to 95 percent credible sets (#921) @Daniel-Considine
reverting to using finngen 95% credible sets (#922) @Daniel-Considine
[variant index] variant description to summarise variant consequences in transcripts (#914) @Daniel Suveges
redefine neighbourhood features to represent similarity with best metric + other fixes (#913) @Irene López Santiago
gzip evicence output to match existing format (#915) @Szymon Szyszkowski
[gold_standard] arbitrary gold standards (#912) @Szymon Szyszkowski
changes to PICS credible sets (OUT_OF_SAMPLE_LD QC flag and capital PICS) (#910) @Vivien Ho
extract pos and chromosome from variantid (#909) @Szymon Szyszkowski
improve partitioning of credible sets (#900) @David Ochoa
[feature_matrix] extract features for gwas associations only (#901) @Irene López Santiago
adding l2g features to prediction table (#899) @Daniel Suveges
[feature_matrix] impute values for gene attribute cols (#895) @Irene López Santiago
deconvolute studies upon ingestion of GWAS Catalog datasets (#887) @Daniel Suveges
add effect size direction to coloc output (#854) @Tobi Alegbe
add step to generate association data (#888) @Vivien Ho
[trainer] log model explanation with shap (#886) @Irene López Santiago
flag and filter credible sets (#879) @Tobi Alegbe
flagging duplicated entries while keeping one of the duplicates (#876) @Daniel Suveges
making credset qc have an option to coalsce and deduplicate credible sets without ld pruning (#877) @Daniel-Considine
l2g feature to indicate if gene is protein-coding or not (#873) @xyg123
[l2g] normalise distance features (#878) @Irene López Santiago
[l2g_feature_matrix] add credibleSetConfidence to L2G (#875) @Irene López Santiago
[variant_index] hash variants at the time of instance creation (#874) @Irene López Santiago
step to export disease/target evidence (#867) @Daniel Suveges
change betas to posterior mean from susie for Finngen credible sets (#872) @Daniel-Considine
add gene count features to l2g (#852) @xyg123
adding desision tree to fine-mapper (#860) @Yakov
gwas catalog top-hit + study step (#808) @David Ochoa
[l2g] extend colocalisation neighbourhood metrics to missing genes in the vicinity (#851) @Irene López Santiago
[susie_finemapper] allow for extraction of the log file from manifest (#859) @Szymon Szyszkowski
[l2g] limit colocalisation neighbourhood to protein coding genes (#847) @Irene López Santiago
[coloc] step refactoring (#845) @Szymon Szyszkowski
adding new LD interface (#759) @Yakov
enhance variant index partitioning (#834) @David Ochoa
[l2g] merge sQTL and tuQTL colocalisation features (#824) @Irene López Santiago
decouple feature generation from L2G training step (#823) @Irene López Santiago
change LD annotation for PICS fine-mapping to use major ancestry (#821) @Vivien Ho
optimisation of qc step (#813) @Yakov
[l2g] implement variant consequence features from VEP (#805) @Irene López Santiago
fix biosample study validation (#810) @Tobi Alegbe
add sumstat QC fields to schema (#809) @Yakov
adding filtering to susie finemapper (#796) @Yakov
[validation] adding credible set confidence annotation at validation time (#801) @Daniel Suveges
force reinstallation of the gentropy on the cluster (#804) @Szymon Szyszkowski
out sample LD qc reason (#798) @David Ochoa
drop v2g and reimplement distance features (#771) @Irene López Santiago
change StudyLocusId hashing method to md5 (and change StudyLocusId to string type) (#783) @Vivien Ho
flag credible sets explained by SuSiE regions (#780) @David Ochoa
99% credible set validation during study_locus_validation (#765) @David Ochoa
add biosample index (#769) @Tobi Alegbe
adding window based clumping to StudyLocus (#779) @Daniel Suveges
add studyType to StudyLocus and Colocalisation (and StudyLocusOverlap) (#782) @Vivien Ho
[dataproc] ability to version gentropy for dataproc cluster (#774) @Szymon Szyszkowski
flag PICS top hits in studies with credset sumstats (#777) @David Ochoa
flag all top-hits from GWAS catalog curation (#775) @David Ochoa
flag MHC credible sets based on lead (#767) @David Ochoa
[validation] adding credible set variant validation (#757) @Daniel Suveges
ingest FinnGen UKB meta-analysis data (#756) @Kirill Tsukanov
adding finemapping method to studylocusid hash (#744) @Daniel-Considine
[variant index] improved data structure (#710) @Daniel Suveges
logic and airflow pipeline for validation (#730) @Daniel Suveges
Finngen r11 ingestion (#733) @Szymon Szyszkowski
[variant_index] changes for a successful run (#735) @Irene López Santiago
notebook for locus breaker and susie finemapping benchmark (#717) @Daniel-Considine
expose summary statistics qc and locus breaker steps to hydra cli (#716) @Szymon Szyszkowski

🐛 Fix

[l2g_predictions] annotate based on list of features + filter out missing annotation (#925) @Irene López Santiago
swap the ref parse (#935) @Szymon Szyszkowski
r2 for lead variant is always 1 (#919) @Yakov
using the 99% PIP cs column, (#904) @Daniel-Considine
reclassify eqtl catalogue sc datasets (#894) @Tobi Alegbe
do not impute isProteinCoding (#902) @Yakov
ensure the #CHROM is not quoted (#896) @Szymon Szyszkowski
[credibleSetConfidence] inner join between study locus and variant index to avoid null genes (#890) @Irene López Santiago
revert distinct for associations input file (#871) @Vivien Ho
[distance_features] correct mean distance equation and correct rows with negative values (#889) @Irene López Santiago
fix in calculate_credible_set_log10bf (#868) @Yakov
logging of finemamper (#870) @Yakov
add scQTLs into coloc features (#833) @Yakov
biosample index add efo cell types (#853) @Tobi Alegbe
adding beta for lead variant (#863) @Yakov
susie credible sets with unknown confidence (#862) @David Ochoa
filter nan in CSs (#855) @Yakov
[eqtl] deduplicating credible set loci (#849) @Daniel Suveges
updating the susie_finemapper init (#846) @Daniel-Considine
l2g fixes (#844) @David Ochoa
fix ukbppp studindex (#839) @Yakov
[l2g] remove custom session params + other fixes (#841) @Irene López Santiago
[trainer] drop studyLocusId from training sets (#837) @Irene López Santiago
[find_overlap] missing right study type in output (#828) @Daniel Suveges
adding single point statistics to pics loci (#832) @Daniel Suveges
write mode added to validation steps (#826) @David Ochoa
empty inSilicoPredictors object in GnomAD variant index (#807) @Daniel Suveges
mhc flag incorrect (#825) @David Ochoa
biosample id duplication (#822) @Tobi Alegbe
adding studId to FM log (#816) @Yakov
fix of type error in schema checking (#817) @Yakov
[validation] add qualityControls column if missing in StudyLocus dataset when perfroming validation (#814) @Szymon Szyszkowski
align the schema of study_index for ukb ppp eur (#803) @Szymon Szyszkowski
adding data specific p-value filters (#788) @Yakov
[schema] recursive validation of arbitrarily deep nested structure (#790) @Daniel Suveges
fix bag in neglog_pvalue_to_mantissa_and_exponent (#795) @Yakov
[safe_array_union] allow for sorting nested structs (#793) @Szymon Szyszkowski
remove study_index_path from coloc step (#791) @Szymon Szyszkowski
[vep_parser] use nested schema for insilico predictors (#789) @Szymon Szyszkowski
clean unused study_locus step parameter (#786) @David Ochoa
[finngen_study_index] improved tests for finngen study index (#776) @Szymon Szyszkowski
remove n_eff check from qc_step (#785) @Yakov
small qc flag fixes (#784) @Yakov
[ld clumping] a revised logic allows a more accurate clumping (#772) @Daniel Suveges
[effect harmonisation] addressing beta harmonisation bug (#762) @Daniel Suveges
add condition to eQTL study index and schema (#770) @Vivien Ho
prevent multiple credible filters to override spark plan (#766) @David Ochoa
multiple fixes after debugging and test runs (#760) @Kirill Tsukanov
removing old functions (#752) @Yakov
validation name mapping (#753) @Szymon Szyszkowski
[finngen_r11] preserve all studyIds (#747) @Szymon Szyszkowski
remove finngen prefix from credible set (#746) @Szymon Szyszkowski
adding carma_tau parameter to susie_finemapper (#743) @Yakov
using h4 instead of log2(h4/h3) (#740) @Daniel-Considine
revert recursiveFileLookup to False (#738) @Szymon Szyszkowski
update cluster creation command (#739) @Szymon Szyszkowski
updating config paths and fine-mapping methods (#725) @Daniel-Considine
change config params to match new name (#721) @Szymon Szyszkowski

📖 Documentation

fix broken refs (#768) @David Ochoa
macos fix for some functions (#729) @Daniel-Considine

♻️ Refactor

finemapping method enum (#897) @David Ochoa
[convert to vcf] allow multiple input sources (#891) @Szymon Szyszkowski
[vep_parser] store consequence to impact score as a project config (#811) @Irene López Santiago
generalise the harmonisation pipeline (#755) @Kirill Tsukanov
generalise per-chromosome processing (#754) @Kirill Tsukanov

⚡️ Performance

quickly build a Docker image for every branch (#773) @Kirill Tsukanov

✅ Test

skip fetch_coordinates_from_rsids (#850) @Irene López Santiago

🏗 Build

[deps-dev] bump ruff from 0.7.1 to 0.8.1 (#936) @dependabot[bot]
[deps-dev] bump ipython from 8.29.0 to 8.30.0 (#937) @dependabot[bot]
[deps-dev] bump pytest-cov from 5.0.0 to 6.0.0 (#893) @dependabot[bot]
[deps-dev] bump mypy from 1.12.1 to 1.13.0 (#884) @dependabot[bot]
[deps-dev] bump ipython from 8.28.0 to 8.29.0 (#883) @dependabot[bot]
[deps-dev] bump ruff from 0.6.1 to 0.7.0 (#864) @dependabot[bot]
[deps-dev] bump mypy from 1.11.0 to 1.12.1 (#865) @dependabot[bot]
[deps-dev] bump mkdocstrings-python from 1.11.1 to 1.12.1 (#842) @dependabot[bot]
[deps-dev] bump pyparsing from 3.1.2 to 3.2.0 (#836) @dependabot[bot]
[deps-dev] bump mkdocs-git-committers-plugin-2 from 2.3.0 to 2.4.1 (#818) @dependabot[bot]
[deps-dev] bump pymdown-extensions from 10.10.1 to 10.11.2 (#815) @dependabot[bot]
[deps-dev] bump pre-commit from 3.8.0 to 4.0.0 (#820) @dependabot[bot]
[deps-dev] bump ipython from 8.27.0 to 8.28.0 (#819) @dependabot[bot]
updated precommits including adjustments to docstrings (#787) @David Ochoa
[deps-dev] bump pymdown-extensions from 10.9 to 10.10.1 (#781) @dependabot[bot]
[deps] bump wandb from 0.17.2 to 0.18.0 (#763) @dependabot[bot]
[deps-dev] bump mkdocstrings-python from 1.10.5 to 1.11.1 (#749) @dependabot[bot]
[deps-dev] bump deptry from 0.19.1 to 0.20.0 (#742) @dependabot[bot]
[deps-dev] bump ipython from 8.26.0 to 8.27.0 (#741) @dependabot[bot]
[deps-dev] bump pre-commit from 3.7.1 to 3.8.0 (#719) @dependabot[bot]
[deps-dev] bump lxml from 5.2.2 to 5.3.0 (#727) @dependabot[bot]
[deps-dev] bump deptry from 0.18.0 to 0.19.1 (#728) @dependabot[bot]
[deps-dev] bump ruff from 0.5.1 to 0.6.1 (#732) @dependabot[bot]
[deps-dev] bump deptry from 0.17.0 to 0.18.0 (#723) @dependabot[bot]
[deps-dev] bump pymdown-extensions from 10.8.1 to 10.9 (#720) @dependabot[bot]

👷‍♂️ Ci

configure java v8 (#840) @Irene López Santiago

🚀 Chore

[vep] Ensembl version update (#931) @Daniel Suveges
[gnomad] updating GnomAD version to 4.1 from 4.0 + using joint frequencies (#929) @Daniel Suveges
pre-commit autoupdate (#918) @pre-commit-ci[bot]
pre-commit autoupdate (#898) @pre-commit-ci[bot]
[deps] bump codecov/codecov-action from 4 to 5 (#916) @dependabot[bot]
validate chromosome (#906) @Daniel Suveges
[l2g] parametrise score threshold when writing predictions (#907) @Irene López Santiago
add hf_model_commit_message to LocusToGeneStep (#905) @Irene López Santiago
pre-commit autoupdate (#885) @pre-commit-ci[bot]
add chromosome validation (#869) @Yakov
pre-commit autoupdate (#866) @pre-commit-ci[bot]
[coloc] changing the content of numberColocalisingVariants field (#857) @Daniel Suveges
adding logging even when no CS in locus (#848) @Yakov
remove h4/h3 ratio (#829) @Yakov
adding priors to coloc step (#830) @Yakov
make the lb clumping ingest the partitionned data (#806) @Szymon Szyszkowski
drop redundant parameter (#802) @Szymon Szyszkowski
pre-commit autoupdate (#724) @pre-commit-ci[bot]
pre-commit autoupdate (#715) @pre-commit-ci[bot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0

What's Changed

💥 Breaking

✨ Feature

🐛 Fix

📖 Documentation

♻️ Refactor

⚡️ Performance

✅ Test

🏗 Build

👷‍♂️ Ci

🚀 Chore