-
Notifications
You must be signed in to change notification settings - Fork 27
Updated parsing of combined modifications #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughModified UNIMOD modification name normalization in the OpenMS converter to handle combined PTM names. When a modification name contains '+' delimiters, each segment is capitalized individually before joining; otherwise, the name is capitalized as a whole. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
sdrf_pipelines/openms/openms.py (1)
246-251: Now let me examine the test files to understand what's currently being tested and check the actual implementation context:Based on my verification analysis, I found:
- No existing test cases for combined modifications in the test files
- SDRF format validation includes whitespace validators, supporting the need for defensive
.strip()handling- Single method call site at line 312, meaning impact is localized
- UNIMOD database lookup provides a safety net for canonical modification names
- No SDRF test data with combined modifications found in the repository
The original review comment makes valid suggestions about edge cases, but lacks context about test coverage and UNIMOD lookup behavior. The suggested improvements are defensive programming measures that are reasonable but may not be strictly necessary for well-formed SDRF input.
Enhance robustness for combined modifications with whitespace and empty segment handling.
The logic correctly addresses combined modifications, with UNIMOD database lookup (lines 253-256) providing a fallback for canonical names. However, two defensive measures improve robustness:
Whitespace handling: SDRF validation includes whitespace validators, so user input might have spaces around delimiters. Stripping each segment ensures proper matching.
Empty segments: While unlikely with valid UNIMOD format, filtering prevents issues from consecutive '+' characters.
Apply this enhancement:
# Check for combined modifications like "Phospho+Oxidation" if '+' in name: - name = [it.capitalize() for it in name.split('+')] + name = [it.strip().capitalize() for it in name.split('+') if it.strip()] name = '+'.join(name) else: name = name.capitalize()Note: The
capitalize()limitation (e.g., "DiMethyl" → "Dimethyl") acknowledged in the PR is mitigated by UNIMOD database lookup, which retrieves canonical names when available.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
sdrf_pipelines/openms/openms.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
User description
Hi all,
here are some proposed changes to parse UNIMOD combined modifications.
Rational: Comet in the bigbio/quantms pipeline breaks when
Met-loss+Acetylis set as a variable modification. The current version ofparse_sdrf convert-openmsconverts it toMet-loss+acetylBy UNIMOD convention, combined modifications should be delimited with '+' (see: https://www.unimod.org/names.html, Rule - 11)
I propose a minimal implementation. It is likely not solving all UNIMOD combined modifications, since some the
it.capitalize()might still break some, but I believe the proposed changes should cover the majority.PR Type
Bug fix
Description
Fix parsing of combined UNIMOD modifications delimited with '+'
Preserve capitalization of individual components in combined modifications
Prevent incorrect lowercasing that broke Comet pipeline compatibility
Diagram Walkthrough
File Walkthrough
openms.py
Handle combined UNIMOD modifications with proper capitalizationsdrf_pipelines/openms/openms.py
individually
Summary by CodeRabbit