Uses an MIT License
CsoDIAq (Cosine Similarity Optimization for DIA qualitative and quantitative analysis, pronounced "Zodiac") is a software tool for the analysis of direct infusion shotgun proteome analysis (DISPA) data and data independent acquisition (DIA) data.
Click here for the paper published in Analytical Chemistry.
All installation instructions are done from the command line. Download and use of CsoDIAq requires access to the git
and pip
packages. For system-specific instructions on using the command line or installing these packages, see the wiki page.
From the command line enter csodiaq gui
to start the GUI. Aside from the MGF library prep all instructions in this README file pertain to use of the GUI.
Instructions for use from the command line can be accessed by entering csodiaq -h
.
Files:
- DIA Data File: This is a required field. Choose at least one data file from a DISPA or DIA data run.
- Library File: This is a required field. Choose a reference library file that is in TraML (.tsv or .csv) or MGF format. It should go without saying, but CsoDIAq treats .tsv files as tab-deliminated and .csv as comma-deliminated. Some pan-human .csv files are tab-deliminated and therefore need to be adapted accordingly.
- Outfile Directory: This is a required field. Choose a folder that the output files should go into.
Settings:
Note that it is recommended that the default value is used for each of these settings.
- Initial Fragment Mass Tolerance (in PPM): Default value is 30. If you're not sure what would make an ideal setting, go ahead and leave this blank, but check the histogram box below. If the resulting histogram doesn't have a normal distribution (a peak), consider using a wider tolerance.
- Correction: Default is checked. This value resets the PPM tolerance range based on an initial scan. It reduces the liklihood of identifying false positives.
- Corrective Standard Deviation: Only available if the correction box is checked. Default is to customize the corrected tolerance to the distribution of the histogram, excluding noise around the peak. Entering a value here instead sets the tolerance to a standard deviation of the distribution.
- Create Histogram: Only available if the correction box is checked. Default is checked. Generates a histogram to visualize the corrective PPM tolerance used in the corrected analysis.
- Number of Target Peptides per Protein: Only available if the protein inference box is checked. Default is to use one target peptide to represent the protein in the output.
- Maximum Number of Query Spectra to Pool: Default is to pool all query spectra. If you DISPA or DIA data file is particularly large and has a high number of scans with overlapping m/z windows, this setting restricts the number of scans that are pooled together for analysis to avoid exceeding your computer's memory capacity. Setting this value can slow down the program.
- Permit Heavy Targets in Re-Analysis File: Default is checked. If checked, files generated for targeted re-analysis will include targets for heavy peptides (includes heavy lysine and arginine).
Files:
- DIA Data File: This is a required field. Choose at least one data file from a DISPA or DIA targetted re-analysis data run.
- Library File: This is a required field. Choose a reference library file that is in TraML (.tsv or .csv) or MGF format. It should go without saying, but CsoDIAq treats .tsv files as tab-deliminated and .csv as comma-deliminated. Some pan-human .csv files are tab-deliminated and therefore need to be adapted accordingly.
- Outfile Directory: This is a required field. Choose a folder that the output files should go into.
- CsoDIAq ID Output File: This is a required field. Choose the required output from the Peptide/Protein Identification portion. The file should end with "allCVs."
Settings:
- Initial Fragment Mass Tolerance (in PPM): Default value is 30. If you're not sure what would make an ideal setting, go ahead and leave this blank, but check the histogram box below. If the resulting histogram doesn't have a normal distribution (a peak), consider using a wider tolerance.
- Correction: Default is checked. This value resets the PPM tolerance range based on an initial scan. It reduces the liklihood of identifying false positives.
- Corrective Standard Deviation: Only available if the correction box is checked. Default is to customize the corrected tolerance to the distribution of the histogram, excluding noise around the peak. Entering a value here instead sets the tolerance to a standard deviation of the distribution.
- Create Histogram: Only available if the correction box is checked. Default is checked. Generates a histogram to visualize the corrective PPM tolerance used in the corrected analysis.
- Number of Max Peaks per Library Spectra: Default is to use all library peaks. Changing this setting will sort the library peaks by intensity and only include the top __ values, as provided in this setting. It is recommended that this value be set if the number of min peak matches setting is not set to the default.
- Number of Min Peak Matches Required: Default is to require matching to one of the top 3 most intense peaks. This is the only setting that then includes all other matched peaks after the initial match - setting a value restricts the final values to the number set here.
- Ratio Selection Method: Default is median. When determining the ratio between a library and query spectrum, ratios are calculated for each matching peak. This setting determines if the mean or median of those matched peaks should be used as the representative spectrum matching ratio.
The MGF library used to develop CsoDIAq lacked protein references.
We therefore created a function that adds protein references from a given FASTA file.
While not part of the original publication and therefore not in the GUI, we kept the functionality as made it accessible from the command line.
csodiaq mgf -m <path to mgf file> -f <path to fasta file>
Additionally, the -i
tag can limit peaks to those identified as a fragment of the peptide sequence. This setting was not used in the publication.