Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic effective for #87

Open
parida007 opened this issue Sep 7, 2018 · 11 comments
Open

Epic effective for #87

parida007 opened this issue Sep 7, 2018 · 11 comments

Comments

@parida007
Copy link

I want to run the epic-effective for human genome hg38. I am using a 16 gb ram for it but after using the command it is showing memory error.
using the following command:
epic-effective -r 36 GRCh38.p10.genome.fa
getting the following error:
File analyzed: GRCh38.p10.genome.fa (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
Genome length: 3236815040 (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
File analyzed: GRCh38.p10.genome.fa
Genome length: 3236815040
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 20443042440 bytes of memory
Aborted (core dumped)
Failed to open input file '/tmp/GRCh38.p10.genome.fa.jf'
Traceback (most recent call last):
File "/usr/local/bin/epic-effective", line 38, in
effective_genome_size(fasta, read_length, nb_cpu, tmpdir)
File "/usr/local/lib/python2.7/dist-packages/epic/scripts/effective_genome_size.py", line 56, in effective_genome_size
shell=True)
File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'jellyfish stats /tmp/GRCh38.p10.genome.fa.jf' returned non-zero exit status 1
rm: cannot remove '/tmp/GRCh38.p10.genome.fa.jf': No such file or directory

But when I use a single chromosome it works fine.
epic-effective -r 36 gencode_chrY.fa
Temporary directory: /tmp/ (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
Genome length: 57227415 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa
Genome length: 57227415
Number unique 36-mers: 21608143 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Effective genome size: 0.377583768199 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Number unique 36-mers: 21608143
Effective genome size: 0.377583768199

So how can overcome this issue for effective genome size calculation for entire genome.

@endrebak
Copy link
Member

endrebak commented Sep 7, 2018

I would rather recommend that you do not compute epic-effective for hg38, it is included in epic. While the version included is calculated for UCSC, it is not going to make a difference :)

Thanks for reporting :) Why do you want to calculate the value?

@parida007
Copy link
Author

I want to calculate the EFFECTIVE_GENOME_FRACTION for hg38 genome. As it is mentioned in the command -"Use a different effective genome fraction than the one included in epic. The default value depends on the genome and readlength, but is a number between 0 and 1." The value will change based on the read length. So for a particular read length I want to compute the effective genome fraction.

@endrebak
Copy link
Member

endrebak commented Sep 10, 2018 via email

@parida007
Copy link
Author

Thanks for your information. No I don't have a large cluster to run it. I have run the command with default parameter setting and getting the enrichment result. Can you please help me to understand what does the following means:
2553959578.0 effective_genome_fraction (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 10 Sep 2018 17:12:44 )
What I have understood is effective genome fraction must be in between 0 and 1. Though initially it shows something like:
Used first 10000 reads of SRR524941.bam.bed to estimate a median read length of 36.0
Mean readlength: 36.0004, max readlength: 40, min readlength: 32. (File: find_readlength, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )
Using an effective genome fraction of 0.8269827491300733. (File: genomes, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )

@endrebak
Copy link
Member

endrebak commented Sep 11, 2018 via email

@parida007
Copy link
Author

That I understood.. but the thing is that can the egf be 2553959578.0 ??

@endrebak
Copy link
Member

endrebak commented Sep 11, 2018 via email

@parida007
Copy link
Author

So it should be 0.8269827491300733... If I am not wrong..

@endrebak
Copy link
Member

endrebak commented Sep 11, 2018 via email

@parida007
Copy link
Author

epic -t SRR524944.bed SRR524945.bed SRR524946.bed -c ../GSM945859-input/SRR504936.bed ../GSM945859-input/SRR504937.bed --genome hg38 --false-discovery-rate-cutoff 0.01 -o enriched_regions_GSM970218.csv -bw bigwigs -sm matrix.gz # epic_version: 0.2.12, pandas_version: 0.23.4 (File: epic, Log level: INFO, Time: Tue, 11 Sep 2018 15:01:20 )

cat: write error: Broken pipe

Can you tell me whether this error is having an impact with the final output matrix

@endrebak
Copy link
Member

endrebak commented Sep 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants