Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantms failing with a big dataset 15K files #339

Closed
ypriverol opened this issue Jan 12, 2024 · 6 comments · Fixed by #341
Closed

quantms failing with a big dataset 15K files #339

ypriverol opened this issue Jan 12, 2024 · 6 comments · Fixed by #341
Assignees
Labels
bug Something isn't working high-priority

Comments

@ypriverol
Copy link
Member

ypriverol commented Jan 12, 2024

Description of the bug

I'm running a big dataset and the diann assembly step fails vdemichev/DiaNN#899. Looks like a memory issue, I have given more than 1.8TB of memory and 48 CPUs. Vadim has suggested using a random number of files only for the library creation, @daichengxin is this related with the previos PR of #335 ?

Command used and terminal output

No response

Relevant files

No response

System information

No response

@daichengxin
Copy link
Collaborator

Yes. It's releated with library creation. How do we determine random number? or should it be ratio?

@ypriverol
Copy link
Member Author

I think it must be a parameter in the first implementation. We only have a few massive datasets like this in fact, working with a previous dataset of 6k files it works this step. Then, I just suggest taking (in this first implementation) a parameter, which can empirical_assembly_ms_n = 200 and then the user can easily configure it in the commandline. What do you think?

@daichengxin daichengxin linked a pull request Jan 13, 2024 that will close this issue
11 tasks
@ypriverol
Copy link
Member Author

ypriverol commented Feb 11, 2024

@daichengxin Im reopening this issue, because the solution still doesn't work. In the current implementation, you selected for the empirical_assemembly a certain number of raw files (👌), but in the Assembly step you are passing all the mzMLs; which will not work for any amount of data I'm trying with the empirical_assembly_ms_n. I have try 10 files but then in the assembly step all the raw files are used and even If I go for 1TB of memory, the tool fails. What are the options here @daichengxin @vdemichev

#!/bin/bash -euo pipefail
# Precursor Tolerance value was: 20.0
# Fragment Tolerance value was: 50.0
# Precursor Tolerance unit was: ppm
# Fragment Tolerance unit was: ppm

ls -lcth

diann -f {all_mzml} \
        --lib lib.predicted.speclib \
        --threads 24 \
        --out-lib empirical_library.tsv \
        --verbose 3 \
        --rt-profiling \
        --temp ./quant/ \
        --use-quant \
        --quick-mass-acc --individual-mass-acc \
        --individual-windows \
        --gen-spec-lib \
         \
        2>&1 | tee assemble_empirical_library.log

@ypriverol ypriverol reopened this Feb 11, 2024
@vdemichev
Copy link

"but then in the assembly step all the raw files are used" - the idea would be to not use all raw files for empirical library generation

@jspaezp
Copy link
Contributor

jspaezp commented Apr 16, 2024

@ypriverol
Copy link
Member Author

Yes this issue is fixed. Let me close it, to remove confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants