Skip to content

Understanding setGT plugin - allelic balance for heterozygotes #2517

@zx8754

Description

@zx8754

Could anyone kindly explain the differences for below 3 bcftools setGT commands? Expectedly, number of "Filled" alleles numbers change significantly, we are not sure which one is actually doing what we are aiming to do.

The intended QC is to change to no call variants called as heterozygotes, but which have a ratio of ref-alt (or vice versa) reads which is more skewed than 3:1, as they're more likely to be artefacts.

Where we're tripping up is whether we should be using & or && and | or || during that step (although we think it should be & and |)

Input is a test vcf for a single chromosome with 100 samples and 98000 variants, it's germline WES data.

# 1. using && and ||
bcftools +setGT tmp.100.qc2.vcf.gz \
  -- -t q -n . \
  -i 'GT="het" && (FMT/AD[*:0] + FMT/AD[*:1] > 0) && (
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) <= 0.25 ||
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) >= 0.75
      )' 
#Filled 686012 alleles

# 2. using & and ||
bcftools +setGT tmp.100.qc2.vcf.gz \
  -- -t q -n . \
  -i 'GT="het" & (FMT/AD[*:0] + FMT/AD[*:1] > 0) & (
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) <= 0.25 ||
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) >= 0.75
      )' 
#Filled 94496 alleles

# 3. using & and |
bcftools +setGT tmp.100.qc2.vcf.gz \
  -- -t q -n . \
  -i 'GT="het" & (FMT/AD[*:0] + FMT/AD[*:1] > 0) & (
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) <= 0.25 |
        FMT/AD[*:1] / (FMT/AD[*:0] + FMT/AD[*:1]) >= 0.75
      )' 
#Filled 4410 alleles

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions