Skip to content

Fix vcfstats per-sample statistics using wrong FORMAT column index#2534

Closed
sirus20x6 wants to merge 1 commit intosamtools:developfrom
sirus20x6:fix/stats-sample-ordering
Closed

Fix vcfstats per-sample statistics using wrong FORMAT column index#2534
sirus20x6 wants to merge 1 commit intosamtools:developfrom
sirus20x6:fix/stats-sample-ordering

Conversation

@sirus20x6
Copy link
Copy Markdown

Summary

  • When --samples-file is used and sample order differs from the VCF, FORMAT field accessors (calc_sample_depth, get_ad, get_iad) used the subset index is instead of the VCF column index reader->samples[is]
  • Introduced ismpl = reader->samples[is] and used it consistently for all FORMAT data access

Fixes #2469

Test plan

  • Existing test suite passes (1920/1920)
  • Verify bcftools stats -s sample_list with reordered sample list

When --samples-file/-S is used and the sample order differs from the
VCF, calc_sample_depth(), get_ad(), and get_iad() were called with the
sample-list index (is) instead of the VCF column index
(reader->samples[is]). This caused depth and allele depth values to be
read from the wrong sample's FORMAT fields, producing incorrect
per-sample statistics (PSC, PSI, VAF) while the sample names followed
the file order.

bcf_gt_type() was already correctly using reader->samples[is]. Now all
FORMAT field accesses consistently use the VCF column index, while stats
arrays remain indexed by the sample-list position.

Fixes samtools#2469
@sirus20x6 sirus20x6 force-pushed the fix/stats-sample-ordering branch from 5eafaa9 to 637ce89 Compare March 26, 2026 00:54
pd3 added a commit that referenced this pull request Mar 27, 2026
Some of the per-sample stats were assigned to different samples
when `bcftools stats -s/-S`, depending on the order and the sample
count.

Resolves #2469, pull request #2534 with a test added
@pd3
Copy link
Copy Markdown
Member

pd3 commented Mar 27, 2026

This is now added, with a test. Thank you

@pd3 pd3 closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bcftools stats --samples-file give wrong results if sample order is different than vcf

2 participants