Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new Loss-of-Function variant data from OTAR2075 #991

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

vivienho
Copy link
Contributor

@vivienho vivienho commented Feb 7, 2025

✨ Context

The LOF variant curation from OTAR2075 provides LOF assessments in the form of five ordinal verdicts. These are converted to scores to populate the variantEffect (previously inSilicoPredictors) field and used to generate additional variant descriptions (variantDescription).

See issue #3385 for further details.

🛠 What does this PR implement

  • parser to extract relevant data as a VariantIndex object
  • new step to ingest the data, which is then provided as an input to the variant_index step to annotate an existing variant index
  • inSilicoPredictors is renamed to variantEffect

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g uv run pre-commit run --all-files)?

@ireneisdoomed
Copy link
Contributor

1981/11172 curated variants are found in our variant index

@vivienho How many of the variants not present in the variant index are predicted/suspected to be LOF?

@vivienho
Copy link
Contributor Author

1981/11172 curated variants are found in our variant index

@vivienho How many of the variants not present in the variant index are predicted/suspected to be LOF?

@ireneisdoomed here is the breakdown:

+--------------+-----------+----------------------+--------------------------+
|    assessment|count_total|count_in_variant_index|count_not_in_variant_index|
+--------------+-----------+----------------------+--------------------------+
|           lof|       3887|                   915|                      2972|
|    likely_lof|       2473|                   414|                      2059|
|     uncertain|       3810|                   571|                      3239|
|likely_not_lof|        595|                    50|                       545|
|       not_lof|        407|                    31|                       376|
+--------------+-----------+----------------------+--------------------------+

_df=(
lof_dataset
.select(
f.from_csv(f.col("Variant ID GRCh37"), "chr string, pos string, ref string, alt string", {"sep": "-"}).alias("h37"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I didn't know about the from_csv() method!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scoping integration of LoF project data into the Variant Page
3 participants