Skip to content

Fix literal string matching for external annotation filter values#2526

Open
sirus20x6 wants to merge 1 commit intosamtools:developfrom
sirus20x6:fix/annotate-pipe-literal
Open

Fix literal string matching for external annotation filter values#2526
sirus20x6 wants to merge 1 commit intosamtools:developfrom
sirus20x6:fix/annotate-pipe-literal

Conversation

@sirus20x6
Copy link

Summary

  • When bcftools annotate uses a filter expression with external annotation values ({VARIABLE} syntax), the string comparison incorrectly split values on commas and matched any component — values like chr5|123456|C|A,CT,G|2 would match on substring CT
  • Use direct whole-string comparison when either operand is an external value

Fixes #2506

Test plan

  • Existing test suite passes (1920/1920)
  • Verify annotation matching with INFO values containing commas and pipes

…ixes samtools#2506)

When bcftools annotate uses a filter expression with external values
(e.g., -i 'SOURCE_RECORD={SOURCE_RECORD}') to match annotation records,
the string comparison incorrectly split values on commas and performed
cross-product matching. This meant that two different INFO field values
could falsely match if they shared any comma-separated component.

External values from annotation file columns are single literal strings
where commas are part of the value, not VCF multi-value separators. This
change makes cmp_vector_strings() perform a direct string comparison
when either operand is an external value (iext > 0), instead of using
_match_vector_strings() which splits on commas.

Also removes a stray debug fprintf left in the regex comparison path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bcftools annotate does not treat INFO field containing pipes (|) as literal string when used as additional matching key

1 participant