Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building revindex databases at scaled 10_000 is fast & (reasonably) low memory! #564

Open
ctb opened this issue Jan 2, 2025 · 1 comment

Comments

@ctb
Copy link
Collaborator

ctb commented Jan 2, 2025

building a combined revindex of host + plants + gtdb rs220 at k=21:

        Command being timed: "snakemake -j 1"
        User time (seconds): 8456.80
        System time (seconds): 3853.24
        Percent of CPU this job got: 494%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 41:31.08
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 24972504
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 167019
        Minor (reclaiming a frame) page faults: 11090432
        Voluntary context switches: 246040798
        Involuntary context switches: 283432
        Swaps: 0
        File system inputs: 50183808
        File system outputs: 51949592
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1

the memory usage is probably almost entirely because we are indexing off a manifest, so all the signatures need to be loaded into memory at once.

see /group/ctbrowngrp5/sourmash-db/gtdb+host-2025.01.02 for the snakefile.

@ctb
Copy link
Collaborator Author

ctb commented Jan 2, 2025

database link: sourmash-bio/sourmash#3467 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant