semvar-release-triggerer
released this
10 Dec 12:38
·
1 commit
to dev
since this release
What's Changed
💥 Breaking
- [l2g!] implement new training strategy splitting between EFO/gene pairs and with cross validation (#938) @Irene López Santiago
- [L2GFeatureMatrix!] streamline feature matrix management (#745) @Irene López Santiago
- [Orchestration!] drop airflow orchestration layer from gentropy (#758) @Szymon Szyszkowski
✨ Feature
- adding GERP conservation score to variant annotation (#933) @Daniel Suveges
- allow building package from tag (#930) @Szymon Szyszkowski
- coalesce l2g fm and predictions (#934) @Szymon Szyszkowski
- coalescing the datasets (#932) @Szymon Szyszkowski
- [gold_standard] add traitFromSourceMappedId to schema (#924) @Irene López Santiago
- changing studylocus validation to 95 percent credible sets (#921) @Daniel-Considine
- reverting to using finngen 95% credible sets (#922) @Daniel-Considine
- [variant index] variant description to summarise variant consequences in transcripts (#914) @Daniel Suveges
- redefine neighbourhood features to represent similarity with best metric + other fixes (#913) @Irene López Santiago
- gzip evicence output to match existing format (#915) @Szymon Szyszkowski
- [gold_standard] arbitrary gold standards (#912) @Szymon Szyszkowski
- changes to PICS credible sets (OUT_OF_SAMPLE_LD QC flag and capital PICS) (#910) @Vivien Ho
- extract pos and chromosome from variantid (#909) @Szymon Szyszkowski
- improve partitioning of credible sets (#900) @David Ochoa
- [feature_matrix] extract features for gwas associations only (#901) @Irene López Santiago
- adding l2g features to prediction table (#899) @Daniel Suveges
- [feature_matrix] impute values for gene attribute cols (#895) @Irene López Santiago
- deconvolute studies upon ingestion of GWAS Catalog datasets (#887) @Daniel Suveges
- add effect size direction to coloc output (#854) @Tobi Alegbe
- add step to generate association data (#888) @Vivien Ho
- [trainer] log model explanation with shap (#886) @Irene López Santiago
- flag and filter credible sets (#879) @Tobi Alegbe
- flagging duplicated entries while keeping one of the duplicates (#876) @Daniel Suveges
- making credset qc have an option to coalsce and deduplicate credible sets without ld pruning (#877) @Daniel-Considine
- l2g feature to indicate if gene is protein-coding or not (#873) @xyg123
- [l2g] normalise distance features (#878) @Irene López Santiago
- [l2g_feature_matrix] add
credibleSetConfidence
to L2G (#875) @Irene López Santiago - [variant_index] hash variants at the time of instance creation (#874) @Irene López Santiago
- step to export disease/target evidence (#867) @Daniel Suveges
- change betas to posterior mean from susie for Finngen credible sets (#872) @Daniel-Considine
- add gene count features to l2g (#852) @xyg123
- adding desision tree to fine-mapper (#860) @Yakov
- gwas catalog top-hit + study step (#808) @David Ochoa
- [l2g] extend colocalisation neighbourhood metrics to missing genes in the vicinity (#851) @Irene López Santiago
- [susie_finemapper] allow for extraction of the log file from manifest (#859) @Szymon Szyszkowski
- [l2g] limit colocalisation neighbourhood to protein coding genes (#847) @Irene López Santiago
- [coloc] step refactoring (#845) @Szymon Szyszkowski
- adding new LD interface (#759) @Yakov
- enhance variant index partitioning (#834) @David Ochoa
- [l2g] merge sQTL and tuQTL colocalisation features (#824) @Irene López Santiago
- decouple feature generation from L2G training step (#823) @Irene López Santiago
- change LD annotation for PICS fine-mapping to use major ancestry (#821) @Vivien Ho
- optimisation of qc step (#813) @Yakov
- [l2g] implement variant consequence features from VEP (#805) @Irene López Santiago
- fix biosample study validation (#810) @Tobi Alegbe
- add sumstat QC fields to schema (#809) @Yakov
- adding filtering to susie finemapper (#796) @Yakov
- [validation] adding credible set confidence annotation at validation time (#801) @Daniel Suveges
- force reinstallation of the gentropy on the cluster (#804) @Szymon Szyszkowski
- out sample LD qc reason (#798) @David Ochoa
- drop
v2g
and reimplement distance features (#771) @Irene López Santiago - change
StudyLocusId
hashing method to md5 (and changeStudyLocusId
to string type) (#783) @Vivien Ho - flag credible sets explained by SuSiE regions (#780) @David Ochoa
- 99% credible set validation during
study_locus_validation
(#765) @David Ochoa - add biosample index (#769) @Tobi Alegbe
- adding window based clumping to StudyLocus (#779) @Daniel Suveges
- add
studyType
toStudyLocus
andColocalisation
(andStudyLocusOverlap
) (#782) @Vivien Ho - [dataproc] ability to version gentropy for dataproc cluster (#774) @Szymon Szyszkowski
- flag PICS top hits in studies with credset sumstats (#777) @David Ochoa
- flag all top-hits from GWAS catalog curation (#775) @David Ochoa
- flag MHC credible sets based on lead (#767) @David Ochoa
- [validation] adding credible set variant validation (#757) @Daniel Suveges
- ingest FinnGen UKB meta-analysis data (#756) @Kirill Tsukanov
- adding finemapping method to studylocusid hash (#744) @Daniel-Considine
- [variant index] improved data structure (#710) @Daniel Suveges
- logic and airflow pipeline for validation (#730) @Daniel Suveges
- Finngen r11 ingestion (#733) @Szymon Szyszkowski
- [variant_index] changes for a successful run (#735) @Irene López Santiago
- notebook for locus breaker and susie finemapping benchmark (#717) @Daniel-Considine
- expose summary statistics qc and locus breaker steps to hydra cli (#716) @Szymon Szyszkowski
🐛 Fix
- [l2g_predictions] annotate based on list of features + filter out missing annotation (#925) @Irene López Santiago
- swap the ref parse (#935) @Szymon Szyszkowski
- r2 for lead variant is always 1 (#919) @Yakov
- using the 99% PIP cs column, (#904) @Daniel-Considine
- reclassify eqtl catalogue sc datasets (#894) @Tobi Alegbe
- do not impute
isProteinCoding
(#902) @Yakov - ensure the #CHROM is not quoted (#896) @Szymon Szyszkowski
- [
credibleSetConfidence
] inner join between study locus and variant index to avoid null genes (#890) @Irene López Santiago - revert distinct for associations input file (#871) @Vivien Ho
- [distance_features] correct mean distance equation and correct rows with negative values (#889) @Irene López Santiago
- fix in calculate_credible_set_log10bf (#868) @Yakov
- logging of finemamper (#870) @Yakov
- add scQTLs into coloc features (#833) @Yakov
- biosample index add efo cell types (#853) @Tobi Alegbe
- adding beta for lead variant (#863) @Yakov
- susie credible sets with unknown confidence (#862) @David Ochoa
- filter nan in CSs (#855) @Yakov
- [eqtl] deduplicating credible set loci (#849) @Daniel Suveges
- updating the susie_finemapper init (#846) @Daniel-Considine
- l2g fixes (#844) @David Ochoa
- fix ukbppp studindex (#839) @Yakov
- [l2g] remove custom session params + other fixes (#841) @Irene López Santiago
- [trainer] drop
studyLocusId
from training sets (#837) @Irene López Santiago - [find_overlap] missing right study type in output (#828) @Daniel Suveges
- adding single point statistics to pics loci (#832) @Daniel Suveges
- write mode added to validation steps (#826) @David Ochoa
- empty inSilicoPredictors object in GnomAD variant index (#807) @Daniel Suveges
- mhc flag incorrect (#825) @David Ochoa
- biosample id duplication (#822) @Tobi Alegbe
- adding studId to FM log (#816) @Yakov
- fix of type error in schema checking (#817) @Yakov
- [validation] add
qualityControls
column if missing in StudyLocus dataset when perfroming validation (#814) @Szymon Szyszkowski - align the schema of study_index for ukb ppp eur (#803) @Szymon Szyszkowski
- adding data specific p-value filters (#788) @Yakov
- [schema] recursive validation of arbitrarily deep nested structure (#790) @Daniel Suveges
- fix bag in neglog_pvalue_to_mantissa_and_exponent (#795) @Yakov
- [safe_array_union] allow for sorting nested structs (#793) @Szymon Szyszkowski
- remove study_index_path from coloc step (#791) @Szymon Szyszkowski
- [vep_parser] use nested schema for insilico predictors (#789) @Szymon Szyszkowski
- clean unused study_locus step parameter (#786) @David Ochoa
- [finngen_study_index] improved tests for finngen study index (#776) @Szymon Szyszkowski
- remove n_eff check from qc_step (#785) @Yakov
- small qc flag fixes (#784) @Yakov
- [ld clumping] a revised logic allows a more accurate clumping (#772) @Daniel Suveges
- [effect harmonisation] addressing beta harmonisation bug (#762) @Daniel Suveges
- add condition to eQTL study index and schema (#770) @Vivien Ho
- prevent multiple credible filters to override spark plan (#766) @David Ochoa
- multiple fixes after debugging and test runs (#760) @Kirill Tsukanov
- removing old functions (#752) @Yakov
- validation name mapping (#753) @Szymon Szyszkowski
- [finngen_r11] preserve all studyIds (#747) @Szymon Szyszkowski
- remove finngen prefix from credible set (#746) @Szymon Szyszkowski
- adding carma_tau parameter to susie_finemapper (#743) @Yakov
- using h4 instead of log2(h4/h3) (#740) @Daniel-Considine
- revert recursiveFileLookup to False (#738) @Szymon Szyszkowski
- update cluster creation command (#739) @Szymon Szyszkowski
- updating config paths and fine-mapping methods (#725) @Daniel-Considine
- change config params to match new name (#721) @Szymon Szyszkowski
📖 Documentation
- fix broken refs (#768) @David Ochoa
- macos fix for some functions (#729) @Daniel-Considine
♻️ Refactor
- finemapping method enum (#897) @David Ochoa
- [convert to vcf] allow multiple input sources (#891) @Szymon Szyszkowski
- [vep_parser] store consequence to impact score as a project config (#811) @Irene López Santiago
- generalise the harmonisation pipeline (#755) @Kirill Tsukanov
- generalise per-chromosome processing (#754) @Kirill Tsukanov
⚡️ Performance
- quickly build a Docker image for every branch (#773) @Kirill Tsukanov
✅ Test
- skip
fetch_coordinates_from_rsids
(#850) @Irene López Santiago
🏗 Build
- [deps-dev] bump ruff from 0.7.1 to 0.8.1 (#936) @dependabot[bot]
- [deps-dev] bump ipython from 8.29.0 to 8.30.0 (#937) @dependabot[bot]
- [deps-dev] bump pytest-cov from 5.0.0 to 6.0.0 (#893) @dependabot[bot]
- [deps-dev] bump mypy from 1.12.1 to 1.13.0 (#884) @dependabot[bot]
- [deps-dev] bump ipython from 8.28.0 to 8.29.0 (#883) @dependabot[bot]
- [deps-dev] bump ruff from 0.6.1 to 0.7.0 (#864) @dependabot[bot]
- [deps-dev] bump mypy from 1.11.0 to 1.12.1 (#865) @dependabot[bot]
- [deps-dev] bump mkdocstrings-python from 1.11.1 to 1.12.1 (#842) @dependabot[bot]
- [deps-dev] bump pyparsing from 3.1.2 to 3.2.0 (#836) @dependabot[bot]
- [deps-dev] bump mkdocs-git-committers-plugin-2 from 2.3.0 to 2.4.1 (#818) @dependabot[bot]
- [deps-dev] bump pymdown-extensions from 10.10.1 to 10.11.2 (#815) @dependabot[bot]
- [deps-dev] bump pre-commit from 3.8.0 to 4.0.0 (#820) @dependabot[bot]
- [deps-dev] bump ipython from 8.27.0 to 8.28.0 (#819) @dependabot[bot]
- updated precommits including adjustments to docstrings (#787) @David Ochoa
- [deps-dev] bump pymdown-extensions from 10.9 to 10.10.1 (#781) @dependabot[bot]
- [deps] bump wandb from 0.17.2 to 0.18.0 (#763) @dependabot[bot]
- [deps-dev] bump mkdocstrings-python from 1.10.5 to 1.11.1 (#749) @dependabot[bot]
- [deps-dev] bump deptry from 0.19.1 to 0.20.0 (#742) @dependabot[bot]
- [deps-dev] bump ipython from 8.26.0 to 8.27.0 (#741) @dependabot[bot]
- [deps-dev] bump pre-commit from 3.7.1 to 3.8.0 (#719) @dependabot[bot]
- [deps-dev] bump lxml from 5.2.2 to 5.3.0 (#727) @dependabot[bot]
- [deps-dev] bump deptry from 0.18.0 to 0.19.1 (#728) @dependabot[bot]
- [deps-dev] bump ruff from 0.5.1 to 0.6.1 (#732) @dependabot[bot]
- [deps-dev] bump deptry from 0.17.0 to 0.18.0 (#723) @dependabot[bot]
- [deps-dev] bump pymdown-extensions from 10.8.1 to 10.9 (#720) @dependabot[bot]
👷♂️ Ci
- configure java v8 (#840) @Irene López Santiago
🚀 Chore
- [vep] Ensembl version update (#931) @Daniel Suveges
- [gnomad] updating GnomAD version to 4.1 from 4.0 + using joint frequencies (#929) @Daniel Suveges
- pre-commit autoupdate (#918) @pre-commit-ci[bot]
- pre-commit autoupdate (#898) @pre-commit-ci[bot]
- [deps] bump codecov/codecov-action from 4 to 5 (#916) @dependabot[bot]
- validate chromosome (#906) @Daniel Suveges
- [l2g] parametrise score threshold when writing predictions (#907) @Irene López Santiago
- add
hf_model_commit_message
toLocusToGeneStep
(#905) @Irene López Santiago - pre-commit autoupdate (#885) @pre-commit-ci[bot]
- add chromosome validation (#869) @Yakov
- pre-commit autoupdate (#866) @pre-commit-ci[bot]
- [coloc] changing the content of
numberColocalisingVariants
field (#857) @Daniel Suveges - adding logging even when no CS in locus (#848) @Yakov
- remove h4/h3 ratio (#829) @Yakov
- adding priors to coloc step (#830) @Yakov
- make the lb clumping ingest the partitionned data (#806) @Szymon Szyszkowski
- drop redundant parameter (#802) @Szymon Szyszkowski
- pre-commit autoupdate (#724) @pre-commit-ci[bot]
- pre-commit autoupdate (#715) @pre-commit-ci[bot]