Updated parsing of combined modifications #240

Kevin-Brockers · 2025-11-03T13:49:35Z

User description

Hi all,

here are some proposed changes to parse UNIMOD combined modifications.

Rational: Comet in the bigbio/quantms pipeline breaks when Met-loss+Acetyl is set as a variable modification. The current version of parse_sdrf convert-openms converts it to Met-loss+acetyl
By UNIMOD convention, combined modifications should be delimited with '+' (see: https://www.unimod.org/names.html, Rule - 11)
I propose a minimal implementation. It is likely not solving all UNIMOD combined modifications, since some the it.capitalize() might still break some, but I believe the proposed changes should cover the majority.

PR Type

Bug fix

Description

Fix parsing of combined UNIMOD modifications delimited with '+'
Preserve capitalization of individual components in combined modifications
Prevent incorrect lowercasing that broke Comet pipeline compatibility

Diagram Walkthrough

flowchart LR
  A["Combined Modification<br/>e.g. Met-loss+Acetyl"] -->|Extract Name| B["Split by '+'"]
  B -->|Capitalize Each| C["Met-loss+Acetyl"]
  C -->|Lookup PTM| D["Valid UNIMOD Entry"]

File Walkthrough

Relevant files

Bug fix

openms.py `Handle combined UNIMOD modifications with proper capitalization` sdrf_pipelines/openms/openms.py Added detection for combined modifications containing '+' delimiter Split combined modifications and capitalize each component individually Preserve '+' as separator between modification components Maintain backward compatibility for single modifications	+6/-1

Summary by CodeRabbit

Bug Fixes
- Improved normalization of combined protein modification names in OpenMS converter to preserve proper formatting (e.g., "Phospho+Oxidation"), ensuring more accurate downstream identification and name resolution of post-translational modifications from the database.

coderabbitai · 2025-11-03T13:49:47Z

Walkthrough

Modified UNIMOD modification name normalization in the OpenMS converter to handle combined PTM names. When a modification name contains '+' delimiters, each segment is capitalized individually before joining; otherwise, the name is capitalized as a whole.

Changes

Cohort / File(s)	Summary
OpenMS modification normalization `sdrf_pipelines/openms/openms.py`	Updated NT name capitalization logic to segment-wise capitalization for combined modification names (containing '+') while preserving existing capitalization behavior for single names

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Focus area: The conditional logic for modification name segmentation and capitalization to ensure it handles both combined and single modification names correctly
Verify edge cases: modification names with multiple '+' delimiters, empty segments, and interaction with existing unimod database lookup

Poem

🐰 A modification split in twain,
With '+' marks and careful strain,
Each segment gleams with caps so bright,
Combined names now normalize right!

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'Updated parsing of combined modifications' directly aligns with the main change described in the PR objectives. The core modification involves updating how UNIMOD combined modifications (delimited by '+') are parsed and normalized in the OpenMS converter, specifically to preserve proper capitalization of each segment in combined modifications like 'Met-loss+Acetyl'. The title accurately reflects this primary objective.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-merge-pro · 2025-11-03T13:50:04Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No audit logs: The new parsing logic for UNIMOD names adds decision paths without any logging of actions or outcomes, making it unclear how modifications are transformed during processing. Referred Code # Check for combined modifications like "Phospho+Oxidation" if '+' in name: name = [it.capitalize() for it in name.split('+')] name = '+'.join(name) else: name = name.capitalize() accession = re.search("AC=(.+?)(;\|$)", mod_string).group(1) ptm = self._unimod_database.get_by_accession(accession) if ptm is not None: name = ptm.get_name()
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Edge cases unhandled: The split-and-capitalize path for combined modifications does not handle empty segments or malformed names and provides no contextual error/logging if parsing fails. Referred Code # Check for combined modifications like "Phospho+Oxidation" if '+' in name: name = [it.capitalize() for it in name.split('+')] name = '+'.join(name) else: name = name.capitalize()
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input not validated: The new '+'-based parsing trusts the extracted name without additional validation or sanitization of combined components, which may allow unexpected tokens to pass through. Referred Code # Check for combined modifications like "Phospho+Oxidation" if '+' in name: name = [it.capitalize() for it in name.split('+')] name = '+'.join(name) else: name = name.capitalize()

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-merge-pro · 2025-11-03T13:50:59Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category

Suggestion

Impact

Possible issue

Fix incorrect combined modification parsing

Move the UniMod accession lookup to only execute for single modifications,
preventing the overwriting of combined modification names.

sdrf_pipelines/openms/openms.py [245-258]

 name = re.search("NT=(.+?)(;|$)", mod_string).group(1)
 # Check for combined modifications like "Phospho+Oxidation"
 if '+' in name:
-    name = [it.capitalize() for it in name.split('+')]
-    name = '+'.join(name)
+    name = '+'.join([it.capitalize() for it in name.split('+')])
 else:
     name = name.capitalize()
-
-accession = re.search("AC=(.+?)(;|$)", mod_string).group(1)
-ptm = self._unimod_database.get_by_accession(accession)
-if ptm is not None:
-    name = ptm.get_name()
+    accession = re.search("AC=(.+?)(;|$)", mod_string).group(1)
+    ptm = self._unimod_database.get_by_accession(accession)
+    if ptm is not None:
+        name = ptm.get_name()
 
 return name

Apply / Chat

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a bug in the PR's logic where combined modification names are overwritten, and the proposed fix correctly resolves this issue.

High

More

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

sdrf_pipelines/openms/openms.py (1)
246-251: Now let me examine the test files to understand what's currently being tested and check the actual implementation context:

Based on my verification analysis, I found:

No existing test cases for combined modifications in the test files

SDRF format validation includes whitespace validators, supporting the need for defensive .strip() handling

Single method call site at line 312, meaning impact is localized

UNIMOD database lookup provides a safety net for canonical modification names

No SDRF test data with combined modifications found in the repository

The original review comment makes valid suggestions about edge cases, but lacks context about test coverage and UNIMOD lookup behavior. The suggested improvements are defensive programming measures that are reasonable but may not be strictly necessary for well-formed SDRF input.

Enhance robustness for combined modifications with whitespace and empty segment handling.

The logic correctly addresses combined modifications, with UNIMOD database lookup (lines 253-256) providing a fallback for canonical names. However, two defensive measures improve robustness:

Whitespace handling: SDRF validation includes whitespace validators, so user input might have spaces around delimiters. Stripping each segment ensures proper matching.

Empty segments: While unlikely with valid UNIMOD format, filtering prevents issues from consecutive '+' characters.

Apply this enhancement:
 # Check for combined modifications like "Phospho+Oxidation"
 if '+' in name:
-    name = [it.capitalize() for it in name.split('+')]
+    name = [it.strip().capitalize() for it in name.split('+') if it.strip()]
     name = '+'.join(name)
 else:
     name = name.capitalize()
Note: The capitalize() limitation (e.g., "DiMethyl" → "Dimethyl") acknowledged in the PR is mitigated by UNIMOD database lookup, which retrieves canonical names when available.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b7dd36 and d0a8f13.

📒 Files selected for processing (1)

sdrf_pipelines/openms/openms.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Codacy Static Code Analysis

Updated parsing of combined modifications

d0a8f13

qodo-merge-pro bot added the Review effort 1/5 label Nov 3, 2025

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated parsing of combined modifications #240

Updated parsing of combined modifications #240

Uh oh!

Kevin-Brockers commented Nov 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Nov 3, 2025

Uh oh!

qodo-merge-pro bot commented Nov 3, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Updated parsing of combined modifications #240

Are you sure you want to change the base?

Updated parsing of combined modifications #240

Uh oh!

Conversation

Kevin-Brockers commented Nov 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

qodo-merge-pro bot commented Nov 3, 2025

PR Compliance Guide 🔍

Uh oh!

qodo-merge-pro bot commented Nov 3, 2025

PR Code Suggestions ✨

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kevin-Brockers commented Nov 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 3, 2025 •

edited

Loading