Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Digestion Count Output #2460

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

nbollis
Copy link
Member

@nbollis nbollis commented Jan 29, 2025

PR Summary

New feature to track and output the number of digestion products from each protein during a classic search.
The number of digestion products that are generated per protein accession is a function of MaxModifiedIsoforms, MaxModsPerPeptide, Initiator Methionine behavior, and splice variants.

Detailed Changes

Introduced functionality to track and write digestion product counts, including GUI updates and new methods for output files.

  • ClassicSearchEngine: Added tracking of digestion counts and methods to retrieve and increment counts.
  • SearchTaskWindow.xaml: Added checkbox for enabling digestion count tracking.
  • PostSearchAnalysisTask: Added methods to write digestion counts and histograms to .tsv files.
  • DictionaryExtensions: Created extension methods for dictionary operations.
  • Added unit tests in PostSearchAnalysisTaskTests.cs and SearchEngineTests.cs for new features.

Figures

These output files will only be generated when the option is checked deep in the advanced parameters of the search task window.

image

image

image

Added a generic Increment method to DictionaryExtensions.cs that increments the value of a specified key or initializes it to one if the key does not exist. Included XML documentation for the method. Updated DictionaryExtensionsTests.cs with unit tests covering various scenarios for the Increment method.
Added IsNullOrEmpty method to DictionaryExtensions.cs to check if a dictionary is null or empty. Included XML documentation for the method. Added unit tests in DictionaryExtensionsTests.cs to verify the method's behavior for null, empty, and non-empty dictionaries.
Introduced functionality to track and write digestion product counts for proteins during a search task. Key changes include:

- Made `DigestionCountDictionary` a public readonly field in `ClassicSearchEngine.cs` and adjusted the constructor accordingly.
- Added an internal property `DigestionCountDictionary` in `PostSearchAnalysisTask.cs` and implemented methods to write counts to .tsv files.
- Modified `SearchTask.cs` to initialize and pass `digestionCountDictionary` to `PostSearchAnalysisTask`.
- Added tests in `PostSearchAnalysisTaskTests.cs` to verify the correct writing of digestion counts and histograms.
@nbollis nbollis force-pushed the DigestionCountsInOutput branch from 2f089fc to 4429039 Compare January 29, 2025 01:26
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 95.72650% with 5 lines in your changes missing coverage. Please review.

Project coverage is 93.90%. Comparing base (e6cf8e7) to head (7c58288).

Files with missing lines Patch % Lines
...eus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs 91.11% 3 Missing and 1 partial ⚠️
...aMorpheus/EngineLayer/Util/DictionaryExtensions.cs 98.14% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #2460    +/-   ##
========================================
  Coverage   93.89%   93.90%            
========================================
  Files         146      147     +1     
  Lines       22206    22320   +114     
  Branches     3059     3076    +17     
========================================
+ Hits        20851    20960   +109     
- Misses        906      909     +3     
- Partials      449      451     +2     
Files with missing lines Coverage Δ
...s/EngineLayer/ClassicSearch/ClassicSearchEngine.cs 97.93% <100.00%> (+0.05%) ⬆️
MetaMorpheus/EngineLayer/Util/AnalyteType.cs 100.00% <ø> (ø)
...aMorpheus/TaskLayer/SearchTask/SearchParameters.cs 100.00% <100.00%> (ø)
MetaMorpheus/TaskLayer/SearchTask/SearchTask.cs 95.91% <100.00%> (+0.08%) ⬆️
...aMorpheus/EngineLayer/Util/DictionaryExtensions.cs 98.14% <98.14%> (ø)
...eus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs 93.88% <91.11%> (-0.09%) ⬇️

@nbollis nbollis marked this pull request as ready for review January 29, 2025 03:20
@Alexander-Sol
Copy link
Contributor

I don't understand the figures shown. If the max is 1024, how does the x-axis go out to 2000 or 3000?

{
if (dictionary.TryGetValue(key, out TValue value))
{
dictionary[key] = value + TValue.One;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly accessing a dictionary element through dictionary[key] can cause race conditions with concurrent dictionaries. Adding/updating a concurrent dict should use the AddOrUpdate method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handling of concurrent dictionary to ensure these methods could be used in a thread safe or normal environment.

concurrentDictionary.AddOrUpdate(key, new List<TValues> { value }, (k, v) =>
{
lock (AddOrCreateLock)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should call the lock inside the addOrUpdate method. That should be handled internally by the AddOrUpdate method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure the internal lists are thread safe. This is needed otherwise unit test AddOrCreate_ThreadSafeWithDictionary fails.

MetaMorpheus/EngineLayer/Util/DictionaryExtensions.cs Outdated Show resolved Hide resolved
MetaMorpheus/EngineLayer/Util/DictionaryExtensions.cs Outdated Show resolved Hide resolved
Updated `DigestionCountDictionary` to track by protein accession and base sequence. Modified `PostSearchAnalysisTask` and `SearchTask` to use the new type. Updated file output logic to include primary sequence and added checks for `WriteDecoys` parameter. Enhanced unit tests to reflect these changes and added new tests for decoy handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants