Skip to content

Quicker mzid parser #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Quicker mzid parser #119

wants to merge 4 commits into from

Conversation

julianu
Copy link
Contributor

@julianu julianu commented Mar 17, 2025

Hej,

I re-wrote the mzident reader and made it much faster, albeit maybe a bit less complete.
For now, I added the new reader alongside the old one. It does not use Pyteomics, but parses the structure more directly. Hence, it is less complete for complicated files, but should be good for most "normal" ones originating from a single search engine and contain only one search run.
I tested the conversion to TSV on some bigger files from MS-GF+ and Comet (2-20 GB) and the output was exactly identical to the files created by the original reader. But the conversion took only about a tenth of teh time (with equal memory consumption).
Would be great, if you could add this new reader, if you like it. As the conversion of the bigger files (like a combination of TimsTOF files and proteogenomics databases) otherwise takes days :)

Cheers,
Julian

@RalfG RalfG requested a review from paretje April 15, 2025 16:14
@RalfG RalfG added the enhancement Improvement of an existing feature label Apr 15, 2025
@paretje
Copy link
Contributor

paretje commented Apr 16, 2025

How much work would it be for you to list any limitations to your parser, especially those that are actually relevant in the context of psm_utils? It would probably be interesting to have this as part of the documentation so people can make an informed decision when selecting the parser. On top of that, it might also be interesting to see how computationally expensive it would be to implement any relevant missing features, and use your parser as the default.

@julianu
Copy link
Contributor Author

julianu commented Apr 17, 2025

I will look over it and check, what information is actually missing / could be missing.
This might take some time due to other things on my list, I will come back to you, when I am done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of an existing feature
Development

Successfully merging this pull request may close these issues.

3 participants