Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupReadsByUmi long runtime #944

Open
ghost opened this issue Oct 19, 2023 · 4 comments
Open

GroupReadsByUmi long runtime #944

ghost opened this issue Oct 19, 2023 · 4 comments
Labels

Comments

@ghost
Copy link

ghost commented Oct 19, 2023

Running this command with v2.1.0:

    fgbio -Xmx32g --async-io GroupReadsByUmi  \
        -Djava.io.tmpdir=tmp  \
        --input=input.bam \
        --output=output.bam  \
        --strategy=Adjacency  \
        --edits=1  \
        --min-map-q=10  \
        --family-size-histogram metrics.txt

image

Why did the first 2M reads take ~8 hours to group?

I have several samples all around 100M reads. Some process quickly as expected and others hang as this one does. I have no idea why this is happening.

@nh13 nh13 added the question label Oct 19, 2023
@nh13
Copy link
Member

nh13 commented Oct 19, 2023

It may be the case you have extremely high coverage of each template and/or genomic coordinate? Can you check if you have provided enough memory by looking at the memory usage of the process?

@ghost
Copy link
Author

ghost commented Oct 19, 2023

I have tried using a large amount of memory (up to 100GB). Would adding multithreading to this step be an option for future development? Similar to what is available in the CallMolecularConsensus step?

@nh13
Copy link
Member

nh13 commented Oct 19, 2023

It definitely looks like you have high coverage in that region, which makes it tough. Not knowing your UMI length(s), you may have very high per-molecule coverage.

It's not too much code, so I think both porting this to rust (like we have for other tools) as well as incorporating other advances since the time we originally wrote the tool can dramatically speed things up and perhaps reduce memory. We would be glad for folks to sponsor that work.

@ghost
Copy link
Author

ghost commented Oct 19, 2023

I understand. Thank you for the reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant