Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does EPIC work ? #88

Open
nservant opened this issue Sep 20, 2018 · 6 comments
Open

How does EPIC work ? #88

nservant opened this issue Sep 20, 2018 · 6 comments

Comments

@nservant
Copy link

Hi,
A short question about the EPIC output file.
For some samples, it seems that EPIC is working pretty well. However, for others (same histone marks), almost all regions are significant ... So I try to set up my own filter on from the results.out file.

Here is an example ;
Chromosome Start End ChIP Input Score Log2FC P
1 chr1 29600 77199 682 1024 2693.9908 0.3391119 1.478957e-09
2 chr1 78200 86199 63 77 373.0064 0.6359773 4.019591e-04
3 chr1 437000 474999 457 635 2127.5436 0.4509215 8.612973e-11
4 chr1 2480800 2494399 96 114 676.9330 0.6775564 7.038296e-06
5 chr1 2606400 2631799 217 256 1367.6917 0.6870352 2.883622e-11
6 chr1 2746400 2863199 1544 2279 6962.8744 0.3637558 8.185924e-22

In the manual ;
The log2_fold change is the number of ChIP reads divided by the number of Input reads in the region (where a pseudocount is computed for regions with no input-reads.)
But for instance, in line 1, the input has more reads than the ChIP ...
And I would therefore expect a Log2FC = log2(682/1024)=-0.586 which is not the case.
Could you explain me why please ?
Thanks

@endrebak
Copy link
Member

I am probably not using raw counts, but RPKM. So since the relative number of reads is likely higher for input you have more ChIP than Input in that region, relatively speaking.

If you could share the data (with me, not the world), I could look into it if you think something might be off. :) Thanks for reporting :)

@endrebak
Copy link
Member

If you find a large difference between SICER and epic I'd love to hear it, they are supposed to give nearly identical results (and guarantee the same ordering of the enriched regions), however the FDR-cutoff might differ slightly due to numerics.

@nservant
Copy link
Author

Thanks for your feedbacks. I did not run SICER yet.
However, I have another bigger issue.
EPIC does not seem to be reproducible !
I run 3 times EPIC with the same parameters, same inputs, and I have completely different results !
Is it expected ?
Of note, I'm running many samples in parallel ... is there any file written, where a sample could overwrite another one ? That's very strange ...
Thanks

@endrebak
Copy link
Member

endrebak commented Sep 21, 2018 via email

@nservant
Copy link
Author

Good news ! This was my fault.
I wrote a small script that does the bam to bed conversion for both ChIP and control and run EPIC.
But several of my samples have the same input, and when I run it in parallel the bam to bed control conversion of the different samples overwrite each other.
Sorry for the mistake. I now fixed it, run it 4 times and have exactly the same results.
Many thanks !!

@endrebak
Copy link
Member

endrebak commented Sep 21, 2018

Ah, I have often done similar things and reported it as a bug. No worries.

Also, to prioritize results you can choose the 1k with the best FDR, for example :) Depends on what you want to do.

Also, I wrote epic because it seems like the best for H3K27me3, I have not tested it extensively on other histonetypes except PolII and H3K4me3, where Macs2 seemed like a better fit.

Macs2 claims to work on H3K27me3 and SICER on shorter histone marks, but I think it is bad advice. A cynical person might think it was to get more cites ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants