khmer #1844

khuch123 · 2018-02-23T14:51:57Z

I got to do digital normalization with metagenome file of size 71.3 Gb, and I got file of 47.7 Gb after that, is it possible?

Shall I post my commands to let you know better and any help in this matter

khuch123 · 2018-02-23T14:52:37Z

Not 47.7 Gb
sorry typing error
but 4.7 Gb

ctb · 2018-02-23T14:53:23Z

hi @khuch123 this would be reasonable for RNAseq or a not-very-diverse metagenome or a super coverage genome, certainly. If you could post the command (in particular the -M parameter) that would help us take a look. What are you sequencing?

khuch123 · 2018-02-23T14:59:27Z

./normalize-by-median.py -k 20 -C 20 -N 4 -x 5e8 -p --savegraph normC20k20.kh final.fq

khuch123 · 2018-02-23T15:01:38Z

I tried the -M parameter but it was not working with -x command and I happen to see this command from the protocols of Kalamazoo metagenome assembly protocols

ctb · 2018-02-23T15:16:35Z

Ahh, yes. OK, I think the main problem I see is that you are using a very
low amount of memory. I would suggest replacing -x 5e8 with:

-M 20e9

to use 20 GB of memory (or more - use as much as you have).

This should result in many more reads being kept :)

There should have been a warning about "use more memory" at the bottom of
the output of normalize-by-median - was there?

best,
--titus

khuch123 · 2018-02-23T17:02:57Z

Okay Sir
I will do as directed
And get back to you
But no warning message given
However it gave the message that using 2 gb memory

You are really very helping
Thank you so much

khuch123 · 2018-02-23T17:04:44Z

If I have to use more memory than 20 GB
Then what should be the command
Will it be still

-M 20e9

standage · 2018-02-23T17:07:56Z

The -x and -N parameters used to be the only way to set memory. They are still an option, but the -M parameter is much easier. It accepts human-readable suffixes, so something -M 20G is valid for using 20 GB of memory, and if you want to increase to 36 GB you can use -M 36G.

standage · 2018-02-23T17:08:52Z

See the khmer docs for a more thorough discussion.

khuch123 · 2018-02-23T17:20:09Z

I read through the docs
But why
e in -M 20e9??

standage · 2018-02-23T17:22:35Z

This is shorthand for 20 × 10⁹. http://python-reference.readthedocs.io/en/latest/docs/float/scientific.html

Convenient notation so that you don't have to type out tons of 0s, back when -x and -N was the only way to set memory usage.

standage · 2018-02-23T17:23:28Z

If you don't specify a suffix like 200M or 20G, then by default the number represents the number of bytes you want to use.

standage · 2018-04-02T20:43:06Z

Did this answer your question(s) @khuch123?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

khmer #1844

khmer #1844

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

ctb commented Feb 23, 2018

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

ctb commented Feb 23, 2018

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

standage commented Feb 23, 2018

standage commented Feb 23, 2018

khuch123 commented Feb 23, 2018

standage commented Feb 23, 2018 •

edited

Loading

standage commented Feb 23, 2018

standage commented Apr 2, 2018

khmer #1844

khmer #1844

Comments

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

ctb commented Feb 23, 2018

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

ctb commented Feb 23, 2018

khuch123 commented Feb 23, 2018

khuch123 commented Feb 23, 2018

standage commented Feb 23, 2018

standage commented Feb 23, 2018

khuch123 commented Feb 23, 2018

standage commented Feb 23, 2018 • edited Loading

standage commented Feb 23, 2018

standage commented Apr 2, 2018

standage commented Feb 23, 2018 •

edited

Loading