Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salmo salar genome request #76

Open
etarisal opened this issue Apr 11, 2018 · 11 comments
Open

salmo salar genome request #76

etarisal opened this issue Apr 11, 2018 · 11 comments

Comments

@etarisal
Copy link

Hello,
I am working with Chip-seq data (Histone modifications) from Salmo salar and I would like to ask you if it is possible to request the genome and genome size file for this organism?. Or any help to create include the annotation I used for the mapping in the epic run?

The genome annotation I used for the mapping is the one found at NCBI (https://www.ncbi.nlm.nih.gov/genome/369?genome_assembly_id=248466), I also include the "unplaced contigs" in this process.
Thanks for your help,
Cheers,

Estefania

@endrebak
Copy link
Member

Thanks for your interest.

I'd love to make epic usable on less common builds/genomes, as I know that is a pain point with many other callers.

All you need is a file with the chromosome/unplaced contig names in one column and the sizes in another.

For UCSC this might look like:

chr1    248956422
chr2    242193529
chr3    198295559
chr4    190214555
chr5    181538259
chr6    170805979
chr7    159345973
chrX    156040895
chr8    145138636
chr9    138394717

Then you can invoke epic with -cs <chromsizes_file> and set the -egf to a number like 0.8. Just setting the egf to any number will only affect the number of regions considered enriched, it will not find different regions or affect the rank order of the results. So if you are interested in the top 1k scoring regions this will work.

The egf suggestion is just a hack until I am able to get the egf info which is computationally expensive. Do you have a link to a fasta genome of your organism?

Endre

@endrebak
Copy link
Member

endrebak commented Apr 12, 2018

Also, do you have input/background files? epic needs that to run - just telling you upfront so you do not waste your time :)

@etarisal
Copy link
Author

etarisal commented Apr 12, 2018 via email

@endrebak
Copy link
Member

But epic should be pretty fast. If it has been running for a long time there is something strange going on :/

@etarisal
Copy link
Author

etarisal commented Apr 13, 2018 via email

@endrebak
Copy link
Member

endrebak commented Apr 13, 2018 via email

@etarisal
Copy link
Author

etarisal commented Apr 13, 2018 via email

@endrebak
Copy link
Member

endrebak commented Apr 13, 2018 via email

@etarisal
Copy link
Author

etarisal commented Apr 13, 2018 via email

@endrebak
Copy link
Member

Hmm, usual genomes have ~25 chromosomes, with your contigs you have 232155. This might be why it takes so long. Is it possible to only run it on the canonical chromosomes?

I will think more about it, I promise.

@etarisal
Copy link
Author

etarisal commented Apr 16, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants