Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust DAT parser #39

Merged
merged 32 commits into from
Feb 13, 2024
Merged

Rust DAT parser #39

merged 32 commits into from
Feb 13, 2024

Conversation

stijndcl
Copy link
Contributor

I created my own parser for Uniprot's custom DAT file format. This format is completely undocumented, so I had to reverse-engineer it by comparing it to an XML file. I have made notes of my findings and added them in a markdown file for you to reference in the future.

My parser is very heavily based on the uniprot-rs library we're using for the XML parser, as writing threaded parsers is quite difficult and I had never done it before.

The parser includes both a simple single-threaded parser, as well as a (far) more complex multi-threaded parser. The multi-threaded parser is only a few hundredths of a second faster than the single-threaded one, so I have left the simple single-threaded parser in as well because it's easier to work with and expand upon.

As a bonus, I introduced unit tests into the project :) There were none before, and I felt like the parser warranted it because it heavily exploits the structure of the DAT file. Testing was added to the CI as well.

@stijndcl stijndcl added the enhancement New feature or request label Jan 29, 2024
@stijndcl stijndcl requested a review from tibvdm January 29, 2024 12:38
@stijndcl stijndcl self-assigned this Jan 29, 2024
@stijndcl stijndcl requested a review from rien February 3, 2024 15:05
@stijndcl stijndcl merged commit 253afe6 into unipept:master Feb 13, 2024
5 checks passed
@stijndcl stijndcl deleted the feature/rust-dat-parser branch February 13, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants