Skip to content

Separating .vcf's by individual on pooled call? #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lokeyCEU opened this issue Apr 20, 2017 · 3 comments
Open

Separating .vcf's by individual on pooled call? #12

lokeyCEU opened this issue Apr 20, 2017 · 3 comments

Comments

@lokeyCEU
Copy link

lokeyCEU commented Apr 20, 2017

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

EDIT: (May 2017) I was mistaken, the merged (pooled) .bam is used during the calling phase NOT discovery.

@tk2
Copy link
Owner

tk2 commented Apr 21, 2017 via email

@lokeyCEU
Copy link
Author

Absolutely.

I did
samtools -r merge
for all individuals from the CEU into a single pooled .bam

Then ran discovery phase;
perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -discover -bam TestCEU-r.bam -output CEU-r.HERVK.tab -eref HERVKfa.tab -refTEs HERVKbed.tab -align

Then call phase;
perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -call -bam TestCEU-r.bam -input CEU-r.HERVK.tab -ref hg19.refFIX.fa -output HERVK.TEST-r.vcf -reads 2 -depth 10000

But the .vcf that comes out is all the pooled individuals and I want the call separated by individual.

Thanks!

@lokeyCEU
Copy link
Author

UPDATE:

The Wildschutte 2016 paper took these, simplified, steps.

  1. -discover phase on individual .bam's from 1kGP, to produce candidates
  2. merge .bam's by population, with samtools merge
  3. -call phase on merged .bam to produce .vcf
    Problem is that output .vcf gives insertion presence of all individuals in ONE column. If each individuals insertion presence were in separate columns one could simply use bcftools to separate.
    Is there something I am missing that will produce .vcf's for each individual, or at least columns by individual, from the merged .bam?

Here is the command I used;
nohup perl retroseq.pl -call -bam TestCEU-r.bam -input HERVK_*.tab -ref hg19.refFIX.fa -output TestPooledCall.CEU-r.vcf -reads 2 -depth 10000 &
NOTE: the -input is a prefix of a series of files all named HERVK_(Insert individuals name here).tab, Is this where things have gone awry?

Thanks!

@lokeyCEU lokeyCEU changed the title Separating .vcf's by individual on pooled discovery? Separating .vcf's by individual on pooled call? Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants