Skip to content

Np add hla to warp #1643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
Open

Np add hla to warp #1643

wants to merge 10 commits into from

Conversation

nikellepetrillo
Copy link
Contributor

Description

Give your PR a concise yet descriptive title.
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed, or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist

If you can answer "yes" to the following items, please add a checkmark next to the appropriate checklist item(s) and notify our WARP team by tagging @broadinstitute/warp-admins in a comment on this PR.

  • Did you add inputs, outputs, or tasks to a workflow?
  • Did you modify, delete or move: file paths, file names, input names, output names, or task names?
  • If you made a changelog update, did you update the pipeline version number?

Copy link

github-actions bot commented Aug 1, 2025

Remember to squash merge!

Copy link

github-actions bot commented Aug 1, 2025

🔍Changelog Validation Results:

Comparing changelogs for pipelines that differ from the versions on 'origin/develop':
All changelog files are valid for this release.

Copy link

github-actions bot commented Aug 1, 2025

🔍Version Validation Results:

Comparing versions and changelogs for pipelines that differ from the versions on 'origin/staging':
All WDLs and changelog files appear to be valid for this release.

## HLAGenotyping
#### Background

This WDL workflow performs HLA genotyping using three separate tools—HLA-HD, Polysolver, and OptiType—and generates a consensus genotype call. It supports both BAM and CRAM input formats and is designed to isolate HLA regions and call genotypes at high accuracy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is designed to isolate HLA regions and call genotypes at high accuracy" repeats the previous sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 removed this sentence as its redundant


Key characteristics:
- Uses GATK to extract and prepare HLA-specific reads.
- Runs HLA-HD, and conditionally runs Polysolver and OptiType if sufficient allele information is detected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "if sufficient allele information is detected" with "when HLA-HD fails to emit a three-field allele".

Key characteristics:
- Uses GATK to extract and prepare HLA-specific reads.
- Runs HLA-HD, and conditionally runs Polysolver and OptiType if sufficient allele information is detected.
- Generates harmonized, two-field genotype calls from all tools.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generates three-field genotypes, and most of the time uses only HLA-HD.

- Generates harmonized, two-field genotype calls from all tools.
- Outputs a consensus genotype combining results from all callers.
- Designed for hg38 reference genome.
- Includes support for Terra-specific input quirks (e.g., `EMPTY_STRING_HACK` for optional fields).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a hack necessary to circumvent the outdated version of cromwell that Terra is tied to. It doesn't need to be listed as a characteristic of the workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

- `File ref_fasta` – Reference FASTA file
- `File ref_fai` – FASTA index file
- `File ref_dict` – Reference dictionary file
- `File hla_intervals` – Interval list specifying HLA regions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say "specifying HLA region of chromosome 6"


#### Step 4. Optitype (conditional)
- If HLA-HD emits at least one two-field genotype, runs Optitype.
- Genotypes A/B/C loci using paired-end FASTQs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say A/B/C genes rather than loci


#### Step 5. Consensus (conditional)
- If both Polysolver and Optitype ran, generates a consensus genotype:
- For A/B/C genes, overrides HLA-HD with Polysolver results if Polysolver and Optitype agree.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise, HLA-HD and Polysolver emit three-field resolution, while Optitype emits two-field. Optitype and Polysolver override HLA-HD if the two-field Optitype genotype is consistent with the Polysolver three field genotype.

#### Step 5. Consensus (conditional)
- If both Polysolver and Optitype ran, generates a consensus genotype:
- For A/B/C genes, overrides HLA-HD with Polysolver results if Polysolver and Optitype agree.
- For other loci, retains HLA-HD results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loci --> genes

## MakeTable
#### Background

This WDL workflow consolidates HLA consensus calls from multiple samples into a single summary table. It generates a tabular output where each row represents a sample and each column represents one HLA allele call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify that diploid genotypes are represented by giving each gene two consecutive columns, one for each allele. Homozygous genotypes simply repeat the same allele, once in both columns.

- `Array[File] consensus_calls` – List of consensus result files, one per sample
- `Array[String] sample_ids` – List of sample IDs (must be the same length and order as `consensus_calls`)

#### Step 1. Combine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I may keep this as is just to keep formatting consistent between AoU ReadMes.

Copy link

Remember to squash merge!

Copy link

🔍Version Validation Results:

Comparing versions and changelogs for pipelines that differ from the versions on 'origin/staging':
All WDLs and changelog files appear to be valid for this release.

Copy link

🔍Changelog Validation Results:

Comparing changelogs for pipelines that differ from the versions on 'origin/develop':
All changelog files are valid for this release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants