Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best choice parameters #78

Open
JoseCorCab opened this issue May 10, 2018 · 1 comment
Open

Best choice parameters #78

JoseCorCab opened this issue May 10, 2018 · 1 comment

Comments

@JoseCorCab
Copy link

Hello,
I'm using epic 0.2.9 to compare two mapping samples against a mapping reference, and I have some questions for you.
I'm looking for enriched regions of unknown size comparing two experimental groups (GROUP1: sample1 and sample2) (GROUP2: reference). But it's essential that no reads of one of the groups map on the enriched sequence.
At first, I mapped the 3 samples sequences (sample 1, sample 2 and reference) against the same reference genome using bowtie2. Then I used epic to compare sample1_mapping/reference_mapping and sample1_mapping/reference_mapping. I created a chromosome-size-file.

When I compared the samples with default parametres, I get large enriched regions and logically a lot of reads of both groups mapped on each enriched region.

I extracted a little test_sample of each sample. Speciffically I extracted known enriched region from genome. Then I reproduce the same steps.
I make a lot of executions with high and low FDR, proving combinations of windows size and gap allowed through some loops.
When I checked the epic results I realized that in sample1/reference comparison, the program return the exactly region.

But in sample2/reference comparison it returns:
-One very large region whith high FDR and logically both samples reads mapped in it.
-A lot of very short regions with no reads mapped from one of the samples, with very low FDR, but there are a lot of little gaps between them.
-No enriched regions.
When I graph the coverage map of these comparison I can see a clear enriched region in both comparisons.
Here are the graphics:
sample1_mapping-against-reference_mapping
sample2_mapping-against-reference_mapping
The most notable difference between sample1 and sample2 mappings is the coverage deep, how you can see in the graphs.
I could test a combination of parametres for tune up the program for the test-files. But in the case of the real samples, I don't think it work because I'm looking for unknown size sequences, from 3 pb to largest possible.
FIRST QUESTION:
Which combination of parameters do you recommend for that kind of experiment?
SECOND QUESTION:
In what degree does the coverage deep diference between mappings affect?
THIRD QUESTION:
In what degree does the definition of the samples as control (-c) or as treatment (-t) affect?

I'm waiting for an answer.
Thanks you very much,
Jose.

@endrebak
Copy link
Member

  1. The recommendations are for specific histone/protein marks.

  2. The sample differences should not matter much as the data are pooled if analyzed together. Library differences can be due to one poor quality experiment though. Perhaps you should investigate the data with deeptools?

  3. You should only use ChIP samples as -t. I guess using ChIP as -c can work, but it is much better to use actual input data.

Thanks for trying epic. I’d love to be of help, but I suspect you’d get much better answers at biostars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants