-
Notifications
You must be signed in to change notification settings - Fork 267
Description
I have a VCF of structural variants which has been annotated with overlapping gnomAD v4.1 structural variants using VEP custom annotations, and I'm using split-vep to convert this data into a tab-delimited format for further analysis.
Where my variants overlap with multiple gnomAD SVs, split-vep will print all the IDs delimited with &, but only the first allele frequency. For example:
bcftools +split-vep -f "[%gnomAD %gnomAD_AF]"
gives
gnomAD-SV_v3_DEL_chr1_90cb8e69&gnomAD-SV_v3_DEL_chr1_48a6a36f&gnomAD-SV_v3_DEL_chr1_58a8b87a&gnomAD-SV_v3_DEL_chr1_c80d6f1c 4.1e-05
However if I specify the AF field as string first I get all values as expected (which surprises me as the fields are specified as strings in the header and I've previously raised an issue here when I was struggling to override that!)
bcftools +split-vep -f "[%gnomAD %gnomAD_AF]" -c "gnomAD_AF:Str"
gives
gnomAD-SV_v3_DEL_chr1_90cb8e69&gnomAD-SV_v3_DEL_chr1_48a6a36f&gnomAD-SV_v3_DEL_chr1_58a8b87a&gnomAD-SV_v3_DEL_chr1_c80d6f1c 4.1e-05&8.7e-05&0.027837&0.003762
I'm working in a restricted research environment so can't easily share an example file, but if more information is needed to replicate the issue I can try figure something out