Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using sort_vcf_same_as_bam.sh #11

Open
bednarsky opened this issue Nov 6, 2023 · 3 comments
Open

Error when using sort_vcf_same_as_bam.sh #11

bednarsky opened this issue Nov 6, 2023 · 3 comments

Comments

@bednarsky
Copy link

bednarsky commented Nov 6, 2023

Thank you for these tools!

I am encountering this error [E:%s] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has %s before %s, but VCF/BCF file has %s after %s" and thus try to use sort_vcf_same_as_bam.sh.

I download a vcf of common snps like so:

wget https://master.dl.sourceforge.net/project/cellsnp/SNPlist/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz
gunzip genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz

For convenience I include the first 300 lines of the resulting vcf file: vcf_head300.txt

Then I try to reorder it like:

# reorder vcf like bam
echo "reordering vcf file"
popscle_helper_tools/sort_vcf_same_as_bam.sh \
    tmp_dir_example_bam/example_bam.bam \
     genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf > genome1K.phase3.SNP_AF5e2.chr1toX.hg38.reordered.vcf

This leads to the following error:

[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
Error encountered while parsing the input at 1:11008

This position refers to the first line in the table in the vcf file.

My bam file looks like this, as shown by samtools view -H tmp_dir_example_bam/example_bam.bam

@hd VN:1.4 SO:coordinate
@sq SN:chr1 LN:248956422
@sq SN:chr10 LN:133797422
@sq SN:chr11 LN:135086622
@sq SN:chr12 LN:133275309
@sq SN:chr13 LN:114364328
@sq SN:chr14 LN:107043718
@sq SN:chr15 LN:101991189
@sq SN:chr16 LN:90338345
@sq SN:chr17 LN:83257441
@sq SN:chr18 LN:80373285
@sq SN:chr19 LN:58617616
@sq SN:chr2 LN:242193529
@sq SN:chr20 LN:64444167
@sq SN:chr21 LN:46709983
@sq SN:chr22 LN:50818468
@sq SN:chr3 LN:198295559
@sq SN:chr4 LN:190214555
@sq SN:chr5 LN:181538259
@sq SN:chr6 LN:170805979
@sq SN:chr7 LN:159345973
@sq SN:chr8 LN:145138636
@sq SN:chr9 LN:138394717
@sq SN:chrM LN:16569
@sq SN:chrX LN:156040895
@sq SN:chrY LN:57227415
@sq SN:KI270728.1 LN:1872759
@sq SN:KI270727.1 LN:448248
@sq SN:KI270442.1 LN:392061
@sq SN:KI270729.1 LN:280839
@sq SN:GL000225.1 LN:211173
@sq SN:KI270743.1 LN:210658
@sq SN:GL000008.2 LN:209709
@sq SN:GL000009.2 LN:201709
@sq SN:KI270747.1 LN:198735
@sq SN:KI270722.1 LN:194050
@sq SN:GL000194.1 LN:191469
@sq SN:KI270742.1 LN:186739
@sq SN:GL000205.2 LN:185591
@sq SN:GL000195.1 LN:182896
@sq SN:KI270736.1 LN:181920
@sq SN:KI270733.1 LN:179772
@sq SN:GL000224.1 LN:179693
@sq SN:GL000219.1 LN:179198
@sq SN:KI270719.1 LN:176845
@sq SN:GL000216.2 LN:176608
@sq SN:KI270712.1 LN:176043
@sq SN:KI270706.1 LN:175055
@sq SN:KI270725.1 LN:172810
@sq SN:KI270744.1 LN:168472
@sq SN:KI270734.1 LN:165050
@sq SN:GL000213.1 LN:164239
@sq SN:GL000220.1 LN:161802
@sq SN:KI270715.1 LN:161471
@sq SN:GL000218.1 LN:161147
@sq SN:KI270749.1 LN:158759
@sq SN:KI270741.1 LN:157432
@sq SN:GL000221.1 LN:155397
@sq SN:KI270716.1 LN:153799
@sq SN:KI270731.1 LN:150754
@sq SN:KI270751.1 LN:150742
@sq SN:KI270750.1 LN:148850
@sq SN:KI270519.1 LN:138126
@sq SN:GL000214.1 LN:137718
@sq SN:KI270708.1 LN:127682
@sq SN:KI270730.1 LN:112551
@sq SN:KI270438.1 LN:112505
@sq SN:KI270737.1 LN:103838
@sq SN:KI270721.1 LN:100316
@sq SN:KI270738.1 LN:99375
@sq SN:KI270748.1 LN:93321
@sq SN:KI270435.1 LN:92983
@sq SN:GL000208.1 LN:92689
@sq SN:KI270538.1 LN:91309
@sq SN:KI270756.1 LN:79590
@sq SN:KI270739.1 LN:73985
@sq SN:KI270757.1 LN:71251
@sq SN:KI270709.1 LN:66860
@sq SN:KI270746.1 LN:66486
@sq SN:KI270753.1 LN:62944
@sq SN:KI270589.1 LN:44474
@sq SN:KI270726.1 LN:43739
@sq SN:KI270735.1 LN:42811
@sq SN:KI270711.1 LN:42210
@sq SN:KI270745.1 LN:41891
@sq SN:KI270714.1 LN:41717
@sq SN:KI270732.1 LN:41543
@sq SN:KI270713.1 LN:40745
@sq SN:KI270754.1 LN:40191
@sq SN:KI270710.1 LN:40176
@sq SN:KI270717.1 LN:40062
@sq SN:KI270724.1 LN:39555
@sq SN:KI270720.1 LN:39050
@sq SN:KI270723.1 LN:38115
@sq SN:KI270718.1 LN:38054
@sq SN:KI270317.1 LN:37690
@sq SN:KI270740.1 LN:37240
@sq SN:KI270755.1 LN:36723
@sq SN:KI270707.1 LN:32032
@sq SN:KI270579.1 LN:31033
@sq SN:KI270752.1 LN:27745
@sq SN:KI270512.1 LN:22689
@sq SN:KI270322.1 LN:21476
@sq SN:GL000226.1 LN:15008
@sq SN:KI270311.1 LN:12399
@sq SN:KI270366.1 LN:8320
@sq SN:KI270511.1 LN:8127
@sq SN:KI270448.1 LN:7992
@sq SN:KI270521.1 LN:7642
@sq SN:KI270581.1 LN:7046
@sq SN:KI270582.1 LN:6504
@sq SN:KI270515.1 LN:6361
@sq SN:KI270588.1 LN:6158
@sq SN:KI270591.1 LN:5796
@sq SN:KI270522.1 LN:5674
@sq SN:KI270507.1 LN:5353
@sq SN:KI270590.1 LN:4685
@sq SN:KI270584.1 LN:4513
@sq SN:KI270320.1 LN:4416
@sq SN:KI270382.1 LN:4215
@sq SN:KI270468.1 LN:4055
@sq SN:KI270467.1 LN:3920
@sq SN:KI270362.1 LN:3530
@sq SN:KI270517.1 LN:3253
@sq SN:KI270593.1 LN:3041
@sq SN:KI270528.1 LN:2983
@sq SN:KI270587.1 LN:2969
@sq SN:KI270364.1 LN:2855
@sq SN:KI270371.1 LN:2805
@sq SN:KI270333.1 LN:2699
@sq SN:KI270374.1 LN:2656
@sq SN:KI270411.1 LN:2646
@sq SN:KI270414.1 LN:2489
@sq SN:KI270510.1 LN:2415
@sq SN:KI270390.1 LN:2387
@sq SN:KI270375.1 LN:2378
@sq SN:KI270420.1 LN:2321
@sq SN:KI270509.1 LN:2318
@sq SN:KI270315.1 LN:2276
@sq SN:KI270302.1 LN:2274
@sq SN:KI270518.1 LN:2186
@sq SN:KI270530.1 LN:2168
@sq SN:KI270304.1 LN:2165
@sq SN:KI270418.1 LN:2145
@sq SN:KI270424.1 LN:2140
@sq SN:KI270417.1 LN:2043
@sq SN:KI270508.1 LN:1951
@sq SN:KI270303.1 LN:1942
@sq SN:KI270381.1 LN:1930
@sq SN:KI270529.1 LN:1899
@sq SN:KI270425.1 LN:1884
@sq SN:KI270396.1 LN:1880
@sq SN:KI270363.1 LN:1803
@sq SN:KI270386.1 LN:1788
@sq SN:KI270465.1 LN:1774
@sq SN:KI270383.1 LN:1750
@sq SN:KI270384.1 LN:1658
@sq SN:KI270330.1 LN:1652
@sq SN:KI270372.1 LN:1650
@sq SN:KI270548.1 LN:1599
@sq SN:KI270580.1 LN:1553
@sq SN:KI270387.1 LN:1537
@sq SN:KI270391.1 LN:1484
@sq SN:KI270305.1 LN:1472
@sq SN:KI270373.1 LN:1451
@sq SN:KI270422.1 LN:1445
@sq SN:KI270316.1 LN:1444
@sq SN:KI270340.1 LN:1428
@sq SN:KI270338.1 LN:1428
@sq SN:KI270583.1 LN:1400
@sq SN:KI270334.1 LN:1368
@sq SN:KI270429.1 LN:1361
@sq SN:KI270393.1 LN:1308
@sq SN:KI270516.1 LN:1300
@sq SN:KI270389.1 LN:1298
@sq SN:KI270466.1 LN:1233
@sq SN:KI270388.1 LN:1216
@sq SN:KI270544.1 LN:1202
@sq SN:KI270310.1 LN:1201
@sq SN:KI270412.1 LN:1179
@sq SN:KI270395.1 LN:1143
@sq SN:KI270376.1 LN:1136
@sq SN:KI270337.1 LN:1121
@sq SN:KI270335.1 LN:1048
@sq SN:KI270378.1 LN:1048
@sq SN:KI270379.1 LN:1045
@sq SN:KI270329.1 LN:1040
@sq SN:KI270419.1 LN:1029
@sq SN:KI270336.1 LN:1026
@sq SN:KI270312.1 LN:998
@sq SN:KI270539.1 LN:993
@sq SN:KI270385.1 LN:990
@sq SN:KI270423.1 LN:981
@sq SN:KI270392.1 LN:971
@sq SN:KI270394.1 LN:970
@rg ID:xy323__:0:1:HG3M7DRXY:1 SM:xy323__ LB:0.1 PU:xy323__:0:1:HG3M7DRXY:1 PL:ILLUMINA
@rg ID:xy323__:0:1:HG3M7DRXY:2 SM:xy323__ LB:0.1 PU:xy323__:0:1:HG3M7DRXY:2 PL:ILLUMINA
@co 10x_bam_to_fastq:R1(CR:CY,UR:UY)
@co 10x_bam_to_fastq:R2(SEQ:QUAL)
@co library_info:{"library_id":0,"library_type":"Gene Expression","gem_group":1,"target_set_name":null}

Thank you for your help and let me know if I should provide any more information please!

@bednarsky
Copy link
Author

bednarsky commented Nov 6, 2023

Also tried to create index for file but did not solve the issue, just referred to different position 1:15

bcftools sort -o genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf -O z genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf
# tabix needs gz file
bgzip -k genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf
tabix -p vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz

@bednarsky
Copy link
Author

Fixed my issue by adapting the bcf file to have the same naming pattern than my bam file

for i in {1..22} X Y; do echo -e "$i\tchr$i"; done > rename_chrs.txt
bcftools annotate --rename-chrs rename_chrs.txt -o genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf

Thanks again for the amazing tool!

@ghuls
Copy link
Member

ghuls commented Dec 7, 2023

Also your BAM header looks a bit weird. Normally the tags should be in upper case (at least for the standard ones).

samtools view -H ${bam} | sed -e 's/^@\([a-z][a-z]\)/@\U\1/' > fixed_bam_header.sam

samtools reheader -P fixed_bam_header.sam in.bam > out.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants