Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

khmer #1844

Open
khuch123 opened this issue Feb 23, 2018 · 13 comments
Open

khmer #1844

khuch123 opened this issue Feb 23, 2018 · 13 comments

Comments

@khuch123
Copy link

I got to do digital normalization with metagenome file of size 71.3 Gb, and I got file of 47.7 Gb after that, is it possible?

Shall I post my commands to let you know better and any help in this matter

@khuch123
Copy link
Author

Not 47.7 Gb
sorry typing error
but 4.7 Gb

@ctb
Copy link
Member

ctb commented Feb 23, 2018

hi @khuch123 this would be reasonable for RNAseq or a not-very-diverse metagenome or a super coverage genome, certainly. If you could post the command (in particular the -M parameter) that would help us take a look. What are you sequencing?

@khuch123
Copy link
Author

./normalize-by-median.py -k 20 -C 20 -N 4 -x 5e8 -p --savegraph normC20k20.kh final.fq

@khuch123
Copy link
Author

I tried the -M parameter but it was not working with -x command and I happen to see this command from the protocols of Kalamazoo metagenome assembly protocols

@ctb
Copy link
Member

ctb commented Feb 23, 2018

Ahh, yes. OK, I think the main problem I see is that you are using a very
low amount of memory. I would suggest replacing -x 5e8 with:

-M 20e9

to use 20 GB of memory (or more - use as much as you have).

This should result in many more reads being kept :)

There should have been a warning about "use more memory" at the bottom of
the output of normalize-by-median - was there?

best,
--titus

@khuch123
Copy link
Author

Okay Sir
I will do as directed
And get back to you
But no warning message given
However it gave the message that using 2 gb memory

You are really very helping
Thank you so much

@khuch123
Copy link
Author

If I have to use more memory than 20 GB
Then what should be the command
Will it be still

-M 20e9

@standage
Copy link
Member

The -x and -N parameters used to be the only way to set memory. They are still an option, but the -M parameter is much easier. It accepts human-readable suffixes, so something -M 20G is valid for using 20 GB of memory, and if you want to increase to 36 GB you can use -M 36G.

@standage
Copy link
Member

See the khmer docs for a more thorough discussion.

@khuch123
Copy link
Author

I read through the docs
But why
e in -M 20e9??

@standage
Copy link
Member

standage commented Feb 23, 2018

This is shorthand for 20 × 109. http://python-reference.readthedocs.io/en/latest/docs/float/scientific.html

Convenient notation so that you don't have to type out tons of 0s, back when -x and -N was the only way to set memory usage.

@standage
Copy link
Member

If you don't specify a suffix like 200M or 20G, then by default the number represents the number of bytes you want to use.

@standage
Copy link
Member

standage commented Apr 2, 2018

Did this answer your question(s) @khuch123?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants