-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot run phASER to my aligned data #57
Comments
Hi Scarlett, I answered you in the other thread, but just to clarify for other people who may see this. It looks like you are using VCFs and BAMs with discordant chromosome naming. Please make sure that both the VCF and BAM have the same naming (e.g. "1" vs "chr1"). Once you have consistent naming between the two, make sure to use the appropriate annotation files with phASER (with or without "chr"), which can be downloaded from here: https://stephanecastel.wordpress.com/2017/02/15/how-to-generate-ase-data-with-phaser/ |
@secastel Thank you! I just figured it out! But my result gene_ae.txt columns (totalCount log2_aFC n_variants variants gw_phased) are all zeros. I am wondering whether it is because my input data is unphased, which only have 0/0,1/1,0/1,1/0 in VCF. Would it explain the "0" variants below? Does phASER work with unphased data?
|
phASER should work with an unphased VCF, although it will work much, much better if you start with a phased VCF. If you are working with human samples, I would strongly recommend phasing your VCF using a tool like the Sanger Imputation Server (https://imputation.sanger.ac.uk). As for why you are seeing all zeros, I expect it may have something to do with a contig naming problem. When running phaser_gene_ae you need to make sure to use the right features file to match your chromosome naming. For example (assuming hg19), if your chromosomes are named “chr1” then you need to use this file: https://www.dropbox.com/s/am09zwpjhs01k8u/gencode.v19.GRCh37.genes.chr.bed.gz?dl=0 otherwise if they are named “1” you need to use this file: https://www.dropbox.com/s/1u9zo1kx61zx6ca/gencode.v19.GRCh37.genes.bed.gz?dl=0 Please let me know if this helps. |
@secastel Thank you! I just have another question regarding to the "unphased" genotype from the output phased VCF file from phASER, I do not see the phased genotype for each variant.
I just checked haplotypic_counts.txt, and I found that instead of outputting the haplotype for each variant, the output files have listed the alleles for each (A/B) haplotype at each chromosome-position range. Please let me know if I miss anything! Thank you! |
The phase will be updated in the "GT" field of the outputted VCF if you have included the argument "--gw_phase_vcf 1". In the case of the first line that you've shown, first, that variant is not covered by any reads, thus phaser will not phase it, and second, it is a homozygous variant, so there is no phase to speak of. phASER runs best when your input VCF has been phased using population phasing. If you are working with human data, I would suggest using a tool like the Sanger Imputation Server. |
Hi!
I aligned HG00096 from 1000GP with STAR and Tophat separately. And then I ran phASER using the same parameter you gave in the tutorial.
The error message for running phASER with the BAM aligned with STAR is:
The problem for using phASER with the BAM aligned with Tophat is that it stuck at first step forever ...
Does phASER work with Tophat aligned bam file? And what parameters do you specify for STAR alignment? I wonder whether that is the point where causing these errors.
Could you help me with this?
I'd appreciate your help!
Thanks,
Scarlett
The text was updated successfully, but these errors were encountered: