-
Notifications
You must be signed in to change notification settings - Fork 0
How to Setup Procssing Modules
- User Manual
- Example Configuration Files
- High-level Overview
- Installation Instructions
- Processing Script
- Common Errors
- Download the User Manual as a PDF: MSconnect_UserManual_Lav_1213_clear.pdf
- Download example configuration files: WorkerSettingFile-Samples.zip
Tip: For the most updated information, please refer to the user manual linked above.
The Proteomics Data Processor is a Windows-based application designed to integrate with MSConnect and automate the processing of mass spectrometry data using third party identification software, such as FragPipe, DIA-NN, and ProteomeDiscoverer. It provides users with real time feedback on the quality of MS analysis, such as the number of protein and peptide identification to aid decision making process. This tool works in tandem with MSConnect to process the MS-based proteomics data. The program is built on C# with .net 4.5.1 (same framework as Thermo Orbitrap Tribride) for Windows platform.
The Key Features of Proteomics Data Processor include:
- Seamless communication with MSConnect.
- Automation of data processing through desktop tools.
- Designed for Windows systems with .NET Framework 4.5.1.
- Command-line configuration for admins and user-friendly GUI for researchers.
The processing workflow connects MSConnect with third-party tools like FragPipe and DIANN for proteomics data analysis. Three components listed below are critical for the workflow :
- Wrappers: The tool installed to MSConnect server that bridges between MSConnect third-party applications, and processors/worker.
- Processors/worker: It is the Proteomics Data Processor that downloaded and installed to the computer and is used to execute workflows configured in MSConnect server.
- Processing Workflows: Define the structure and logic for MS file data processing using the processor/worker.
The Proteomics Data Processor can be installed on multiple PC to create more node for Parallel Processing. This section provides a comprehensive guide for setting up the processing workflow after the MSConnect server has been initialized. Note that the Proteomics Data Processor is also referred to as Processing application, or Processing worker.
The installation process involves downloading and configuring wrappers, setting up workflows for processors, and preparing the Proteomics Data Processor application.
Before proceeding with installation, ensure the following requirements are met:
-
Hardware:
- Windows OS (Windows 10 or higher).
- Administrator privileges on the machine.
- Software:
- .NET Framework 4.5.1 ([Download here] (https://dotnet.microsoft.com/download)).
- Processing tools with valid licenses:
- [FragPipe](https://fragpipe.nesvilab.org/)
- [DIA-NN](https://github.com/vdemichev/DiaNN)
- [ProteomeDiscoverer](https://www.thermofisher.com/).
- Network:
- Ensure internet connectivity for MSConnect communication.
I. Login to MSConnect as admin:
II. Download Wrappers:
-
Navigate to the Processing tab.
-
Click "Get More" at the top of the page to access available wrappers.
-
In Apps from the remote marketplace section click "Download to my PDM" under the action column of the desired wrapper (e.g., FragPipe, DIANN, or PD).
-
Scroll down to the bottom of the page and click Install on the app you desire to install in 'Currently Downloaded Apps in Local Proteomics Data Manager'.
-
Once the app is installed, click enable under the same section where you
III. Restart the Server:
-
Go to the Advanced Settings page under the Settings tab.
-
Press ‘Restart Data System’ to initialize changes.
-
Wait a few minutes for the restart to complete.
IV. After the restart of the server, the Processing Wrappers should now appear in the Processing tab at the top.
MSConnect Website-Remote Markerplace for Wrapper downloading
The Processing workflow is called Automatic QC setting preset on the MSConnect Settings page. The selected processing app/wrapper and preset will be uploaded to the processing queue along with the MS files via raw file uploader.
The installation of a wrapper creates a ProcessingApp object associated with the wrapper under the admin page. It is preloaded with a couple presets that users can use as template to modify their search parameter, such as FASTA and acquisition mode. The following instructions show how to create a different preset that uses different search parameters, such as FASTA from an existing search result from the third-party program such as DIA NN or Fragpipe
- Go to the Admin page under the Help tab.
- Go to the ‘Processing Apps’ section under the FILE_MANAGER
- Select the desired ‘ProcessingApp Object’. The ProcessingApp object associated with the specific processing wrapper will be created on the admin page after the installation of wrapper on MSConnect website. By default, each of the apps are called ProcessingApp object (#). You can find the name of processing app/wrapping once clicking into it. For example, on the demo site:
- ProcessingApp object (2) : PD 3 Processor
- ProcessingApp object (3) : FragPipe Processor
- ProcessingApp object (4) : DIA NN Processor
-
Edit Preset. There are 8 preset slots and 2 user preset slots. The preset slots will be replaced if the wrapper is updated to a newer version, while the user preset will not. Each preset is a zip folder that contains all input files defined parameter for a specific processing app/wrapper. The content required in the zip folder is different between wrappers.
- Create a new preset: upload the zip folder then click save at the bottom of the page.
- Delete an old preset: check the clear box at the right of preset then click save at the bottom of the page.
- Edit an existing preset: replace the zip folder by uploading a new zip folder next to change then click save at the bottom of the page.
-
Build a zip folder: the input files in the folder define the search parameters and is generally one of the output files resulting from the search on the third-party identification software.
- Run the search on local identification software.
- Export required output files from identification software as input files sources
- Rename the output files to match exactly the name given in the preset template, also listed in the table below.
- Put all the files into a zip folder. The folder can be named any way for easier recognition.
Identification Software | Input files name in .zip folder | Input files sources |
---|---|---|
PD 2.5/PD3 | input_file_1.pdProcessingWF | pdProcessingWF |
input_file_2.pdConsensusWF | pdConsensusWF | |
FragPipe | input_file_1.workflow | fragpipe.workflow |
parameters.json | Manually change the data_type to DDA, DDA+, or DIA | |
DIA NN | input_file_1.txt | From ‘report.log.txt’, Copy the parameter (and starts after the input file name and end at the end of that section of log ) and save it as a new . txt file. |
Note: Make sure the path and files for FSATA and Speclib on computer that house processor worker is the same as in the input files, and is identical across all computer that execute parallel processing to ensure program functioning correctly and consistency results. |
The Proteomics Data Processor is the application installed on the computer that houses the third-party identification software like DIA NN, Fragpipe, or Proteome Discover. It is also referred to as the worker in this documentation. The worker downloads the MS files and the processing preset from MSConnect server and connect it to the identification software via command line. It can be installed on multiple computers for Parallel Processing and get through the queue faster.
- Click on the most recent release on the right panel.
- Download the
Proteomics_Data_Processor.zip
.
- Extract the zip folder.
- Run the
.exe
file from the extracted folder. - Create a desktop shortcut for easy access.
Upon launch, the application automatically loads default.xml
.
This file contains parameters that:
- Connect the application to the MSConnect server.
- Link the application to the local identification software.
Settings are specific to each wrapper. The following section explains the three tabs in the application.
At the bottom, sample setting files for DIA NN, Fragpipe, and PD wrappers are provided.
Connect the application to the MSConnect server by entering the following details:
- Server IP/Hostname.
-
Login credentials:
user: search_worker
password: searchadmin
- Computer name and IP address.
- Assign a worker number for identification.
- Click "pull list from server".
- Select the process app from the dropdown menu (only installed wrappers will appear).
-
Select checkboxes for specific settings to apply:
- Start app with Windows start.
- Start process when app starts.
- Ignore tasks with start time (for parallel processing).
- Reverse order (queues data to process the newest runs first).
- Timestamp of queue start and finish.
- Queue number for the current analysis.
- Total time spent on the analysis.
- Worker status.
- Last server check-in time.
Proteomic Data Processor-Main Settings tab
Create Temporary Folders that store the MS file to be processed and the output files generated by the identification software. (e.g., `D:\QC_DIA`). Files in this folder will be cleared once the analysis finished and all the output files are uploaded to MSConnect (shows in Processing page).
In each of the Output slot (1 -6), indicates the output files path and name under the temporary folder folder you would like to upload to MSConnect. The file name must match exactly for successful uploads. For example, `report.pr_matrix.tsv`, `report.pg_matrix.tsv`, `report_stats.ts`,and `report_log.txt` are uploaded as output file 1 to 4 in the example screenshot for DIA NN wrapper.
The processing script is the command line that sends the search request and parameter to the identification software. The script is varied between wrappers. Below are the examples script for different wrappers. The most critical variation from user to use is the path to the execution file of the identification software of the choice.
/c C:\Fragpipe_22\fragpipe\bin\fragpipe.bat --headless --workflow &&input_1&& --manifest &&input_2&& --workdir D:\QC_fragpipe\ --config-tools-folder C:\Fragpipe_22\fragpipe\tools\
/c C:\\DIA-NN\\1.9\\DiaNN.exe --out D:\\QC_DIA\\report.tsv &&loop&& --f &&raw_file_name&& &&loop&&
/c DiscovererDaemon.exe -c custom &&loop&& -a custom &&raw_file_name&& &&loop&& -r &&output&&.msf -b -e custom ANY &&input_1&&;&&input_2&&
d. Setting files example. Follow the instructions to tailor the setting to your need and save it as `default.xml` in the file location of the application.
- Default_DIANN.xml
- Default_Fragpipe.xml
- Default_PD.xml
Setup the application for different wrappers. There are two approaches to setup the application for different wrappers.
-
Duplicate the zip folder (recommended). This approach is more straight forward to the end users. Since you can rename the shortcut according to the specific wrapper, and the users also don’t have to worry about choosing correct setting. It also allows parallel processing for different wrapper on the same PC.
- Duplicate the zip folder
- Rename the folder and .exe for easier recognition.
- Edit the default setting files to meet your need following instruction above.
- Save the new setting files as `default.xml` in the file location of the application
- Create a shortcut on the Desktop
-
Load wrapper specific setting file after starting the application. This requires more communication between admin and end user.
- Create setting file for different wrappers.
- Save it under the file location of the application with clear naming structure to differentiate between wrappers.
- Every time when opening the application, click File/Loading settings from, then select the desire setting files.
Use the application for parallel processing. The application can be installed on to different PC or be opened multiple times on the same PC for parallel processing. To installed on different PC, follow the steps above. To execute parallel processing on the same PC, simply double click on the application again, and it will start a new window. The following two notes are critical for successful parallel processing.
- Make sure the path and files for FSATA and Speclib are identical as in the input files of the preset across all computers that house processor application/workers .
- The worker number is different between all processing application/worker.
- Failed to verify workflow post set up. Always test the worker and preset after setting up or making edition.
- naming conventions and folder structures doesn’t match the preset on MSConnect.
- The application/worker was not started. This happens a lot when starting new set of data collection on MS after a period of idle, or when needing to switch to a different wrapper. It also happens when the PC is restarted, but the start app/process with Windows start box isn’t checked. The best solution is to always leaving the application open and idle on your machines to ensure that it will pick up the next available task on the queue.
- The licenses of identification program is expired or the version doesn’t match the processing script.