Epic effective for #87

parida007 · 2018-09-07T05:29:25Z

I want to run the epic-effective for human genome hg38. I am using a 16 gb ram for it but after using the command it is showing memory error.
using the following command:
epic-effective -r 36 GRCh38.p10.genome.fa
getting the following error:
File analyzed: GRCh38.p10.genome.fa (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
Genome length: 3236815040 (File: effective_genome_size, Log level: INFO, Time: Thu, 06 Sep 2018 17:28:46 )
File analyzed: GRCh38.p10.genome.fa
Genome length: 3236815040
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 20443042440 bytes of memory
Aborted (core dumped)
Failed to open input file '/tmp/GRCh38.p10.genome.fa.jf'
Traceback (most recent call last):
File "/usr/local/bin/epic-effective", line 38, in
effective_genome_size(fasta, read_length, nb_cpu, tmpdir)
File "/usr/local/lib/python2.7/dist-packages/epic/scripts/effective_genome_size.py", line 56, in effective_genome_size
shell=True)
File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'jellyfish stats /tmp/GRCh38.p10.genome.fa.jf' returned non-zero exit status 1
rm: cannot remove '/tmp/GRCh38.p10.genome.fa.jf': No such file or directory

But when I use a single chromosome it works fine.
epic-effective -r 36 gencode_chrY.fa
Temporary directory: /tmp/ (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
Genome length: 57227415 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:26 )
File analyzed: gencode_chrY.fa
Genome length: 57227415
Number unique 36-mers: 21608143 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Effective genome size: 0.377583768199 (File: effective_genome_size, Log level: INFO, Time: Fri, 07 Sep 2018 11:00:39 )
Number unique 36-mers: 21608143
Effective genome size: 0.377583768199

So how can overcome this issue for effective genome size calculation for entire genome.

endrebak · 2018-09-07T08:31:14Z

I would rather recommend that you do not compute epic-effective for hg38, it is included in epic. While the version included is calculated for UCSC, it is not going to make a difference :)

Thanks for reporting :) Why do you want to calculate the value?

parida007 · 2018-09-07T12:39:22Z

I want to calculate the EFFECTIVE_GENOME_FRACTION for hg38 genome. As it is mentioned in the command -"Use a different effective genome fraction than the one included in epic. The default value depends on the genome and readlength, but is a number between 0 and 1." The value will change based on the read length. So for a particular read length I want to compute the effective genome fraction.

endrebak · 2018-09-10T06:16:32Z

I can guarantee you that it will not matter much for epic at least. The difference between 36 and 100 bp for hg38 is < 10% which does not make a difference to the statistics. And I cannot help you run it unfortunately. Do you have a cluster/larger machine to run this on? The problem is that the another piece of software (jellyfish2) uses too much memory.

…

On Fri, Sep 7, 2018 at 2:39 PM parida007 ***@***.***> wrote: I want to calculate the EFFECTIVE_GENOME_FRACTION for hg38 genome. As it is mentioned in the command -"Use a different effective genome fraction than the one included in epic. The default value depends on the genome and readlength, but is a number between 0 and 1." The value will change based on the read length. So for a particular read length I want to compute the effective genome fraction. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0r_wb62nHVtxJliQY72YiRMQUOk3ks5uYmj6gaJpZM4WeM-J> .

parida007 · 2018-09-11T05:39:27Z

Thanks for your information. No I don't have a large cluster to run it. I have run the command with default parameter setting and getting the enrichment result. Can you please help me to understand what does the following means:
2553959578.0 effective_genome_fraction (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 10 Sep 2018 17:12:44 )
What I have understood is effective genome fraction must be in between 0 and 1. Though initially it shows something like:
Used first 10000 reads of SRR524941.bam.bed to estimate a median read length of 36.0
Mean readlength: 36.0004, max readlength: 40, min readlength: 32. (File: find_readlength, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )
Using an effective genome fraction of 0.8269827491300733. (File: genomes, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 )

endrebak · 2018-09-11T06:53:29Z

It is just output of the software. It says your readlength is 36 and therefore it chooses a certain egf. :)

…

On Tue, Sep 11, 2018 at 7:39 AM parida007 ***@***.***> wrote: Thanks for your information. No I don't have a large cluster to run it. I have run the command with default parameter setting and getting the enrichment result. Can you please help me to understand what does the following means: 2553959578.0 effective_genome_fraction (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 10 Sep 2018 17:12:44 ) What I have understood is effective genome fraction must be in between 0 and 1. Though initially it shows something like: Used first 10000 reads of SRR524941.bam.bed to estimate a median read length of 36.0 Mean readlength: 36.0004, max readlength: 40, min readlength: 32. (File: find_readlength, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 ) Using an effective genome fraction of 0.8269827491300733. (File: genomes, Log level: INFO, Time: Mon, 10 Sep 2018 16:54:29 ) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0uAk7dZKU3N0hH-7Ij526SS7-GVSks5uZ0yQgaJpZM4WeM-J> .

parida007 · 2018-09-11T08:11:52Z

That I understood.. but the thing is that can the egf be 2553959578.0 ??

endrebak · 2018-09-11T08:52:40Z

No, that is the effective genome size. Good catch. Should update that :)

…

On Tue, Sep 11, 2018 at 10:11 AM parida007 ***@***.***> wrote: That I understood.. but the thing is that can the egf be 2553959578.0 ?? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0on3UNYX6iZ0sGNzgh7xNrftWcwEks5uZ3BJgaJpZM4WeM-J> .

parida007 · 2018-09-11T08:53:41Z

So it should be 0.8269827491300733... If I am not wrong..

endrebak · 2018-09-11T08:55:14Z

Yes, but the egs is egf * genome size. The code is correct, the logging info not.

…

On Tue, Sep 11, 2018 at 10:53 AM parida007 ***@***.***> wrote: So it should be 0.8269827491300733... If I am not wrong.. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0lNvY684ig89TcR8CqWzk94uLHVcks5uZ3oXgaJpZM4WeM-J> .

parida007 · 2018-09-11T09:29:48Z

epic -t SRR524944.bed SRR524945.bed SRR524946.bed -c ../GSM945859-input/SRR504936.bed ../GSM945859-input/SRR504937.bed --genome hg38 --false-discovery-rate-cutoff 0.01 -o enriched_regions_GSM970218.csv -bw bigwigs -sm matrix.gz # epic_version: 0.2.12, pandas_version: 0.23.4 (File: epic, Log level: INFO, Time: Tue, 11 Sep 2018 15:01:20 )

cat: write error: Broken pipe

Can you tell me whether this error is having an impact with the final output matrix

endrebak · 2018-09-11T10:40:24Z

No, it is not :)

…

On Tue, Sep 11, 2018 at 11:29 AM parida007 ***@***.***> wrote: epic -t SRR524944.bed SRR524945.bed SRR524946.bed -c ../GSM945859-input/SRR504936.bed ../GSM945859-input/SRR504937.bed --genome hg38 --false-discovery-rate-cutoff 0.01 -o enriched_regions_GSM970218.csv -bw bigwigs -sm matrix.gz # epic_version: 0.2.12, pandas_version: 0.23.4 (File: epic, Log level: INFO, Time: Tue, 11 Sep 2018 15:01:20 ) cat: write error: Broken pipe Can you tell me whether this error is having an impact with the final output matrix — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0ssoBafu4G872Zd1VxHyNTeWDIRYks5uZ4KMgaJpZM4WeM-J> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic effective for #87

Epic effective for #87

parida007 commented Sep 7, 2018

endrebak commented Sep 7, 2018

parida007 commented Sep 7, 2018

endrebak commented Sep 10, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

Epic effective for #87

Epic effective for #87

Comments

parida007 commented Sep 7, 2018

endrebak commented Sep 7, 2018

parida007 commented Sep 7, 2018

endrebak commented Sep 10, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email

parida007 commented Sep 11, 2018

endrebak commented Sep 11, 2018 via email