-
Notifications
You must be signed in to change notification settings - Fork 1
CSV result
A typical reports folder contains following CSV files:

All files with prefix database_name_search_date, for example the uniprot-ecoli-20171023_2017.12.22 means the data is searched against the uniprot-ecoli-20171023 database, and the search date is 2017.12.22.
There are mainly four types of result files:
-
uniprot-ecoli-20171023_2017.12.22.csvwith the shortest file name, it contains all unfiltered PSMs, each line with one PSM, it maybe cross-linked, loop-linked, mono-linked, or regular PSM. -
uniprot-ecoli-20171023_2017.12.22.filtered_X_Y.csvcontains filtered results for different peptide types (X) at different level (Y). X can be cross-linked, loop-linked, mono-linked, or regular; Y can be spectra, peptides, or sites. -
uniprot-ecoli-20171023_2017.12.22.precursor_error_distribution.csvanduniprot-ecoli-20171023_2017.12.22.filtered_precursor_error_distribution.csvcontain precursor errors from unfiltered and filtered PSMs respectively. They are visualized on the web page result, so they can be skipped when reading this page. -
uniprot-ecoli-20171023_2017.12.22.summary.txtcontains summary information about the search, such as the number of identified PSMs, the search time, etc.
uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv contains all cross-linked PSMs filtered by TDA-FDR and without decoy results, one PSM per line. uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_peptides.csv and uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_sites.csv are directly inferred from the uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv.
There are 21 columns in uniprot-ecoli-20171023_2017.12.22.filtered_cross-linked_spectra.csv:
-
Order: the order of PSMs, start from 1. -
Title: the title of this spectrum. If RAW file is used, the scheme of title isRAWName.Scan.Scan.Charge.pParseID.dta. For exampleRD_pH_8point3_step2.7566.7566.3.0.dtameans the MS2 scan7566from RAWRD_pH_8point3_step2, the charge is3and the pParseID is0. pParseID is the order of precursor ID extracted from MS1 by pParse, the lower the higher credibility, 0 is the best. For more details about pParse, please see pParse. -
Charge: the charge of this spectrum. -
Precursor_Mass: the experimental [MH+] of precursor. -
Peptide: the peptide sequence of identification.AKLESLVEDLVNR(2)-HMNIKVTR(5)means peptideAKLESLVEDLVNRcross-link withHMNIKVTRin site2and5respectively. For mono-linked and loop-linked peptides, there are one or two cross-linked sites on one peptide. -
Peptide_Type: the peptide type of identification, it can be Cross-Linked, Loop-Linked, Mono-Linked, or Regular/Common. -
Linker: the cross-linker name identified. For regular results, it isnull. -
Peptide_Mass: the theoretical [MH+] of peptide. -
Modifications: the identified modifications on this peptide. For example,Carbamidomethyl[C](6)means Carbamidomethyl happens on 6th site, which is a Cysteine. If more than one modification, they are splitted by semicolon.nullmeans no modifications. -
Evalue: the E-value for the entire peptide(-pair), the smaller the more confident. -
Score: the SVM score of this peptide, the smaller the more confident. It is the prime measure for FDR estimation. -
Precursor_Mass_Error(Da): precursor mass error in Da. -
Precursor_Mass_Error(ppm): precursor mass error in ppm. -
Proteins: inferred proteins from this peptide. For example,sp|P0A6Y8|DNAK_ECOLI (304)-sp|P0A6Y8|DNAK_ECOLI (299)/meanssp|P0A6Y8|DNAK_ECOLIcross-link withsp|P0A6Y8|DNAK_ECOLIin site304and299respectively. If more than one protein pair is inferred, they are splitted by slash. -
Protein_Type: the protein type of this identification. Whether it is aIntra-proteinorInter-proteincross-link. For mono-linked, loop-linked, and regular results, it isNone. -
FileID: Which RAW file was this PSM identified from? Start from 1. The ID of one RAW file is decided by the order when added. The map of RAW file and FileID is shown in the parameter file. -
LabelID: the ID of labeling in quantitation. Start from 1. The map of labeling and LabelID is shown in the parameter file. -
Alpha_Matched: the number of matched fragment ion for alpha peptide. -
Beta_Matched: the number of matched fragment ion for beta peptide. SupposeAlpha_NumandBeta_Nummean the number of peaks matched to alpha peptide and beta peptide respectively. But some peaks may match both alpha and beta peptide, suppose there areShare_Numshared peaks. Then, the finalAlpha_Matched=Alpha_Num-0.5*Share_Num,Beta_Matched=Beta_Num-0.5*Share_Num. As a result, 1.5 or 0.5 may appear. -
Alpha_Evalue: the E-value for alpha peptide only, the smaller the more confident. -
Beta_Evalue: the E-value for beta peptide only, the smaller the more confident.
pLink2 won't calculate three E-values (Evalue, Alpha_Evalue, and Beta_Evalue) by default, in this case, all E-values will be 1. If the Compute E-value checkbox in Identification panel is selected, pLink2 will calculate three E-values only for PSMs that pass the FDR threshold. For E-value, the smaller the more confident, it is similar to the score in pLink1.
The columns in other 2 levels (peptides, sites) have the same meaning as in spectra level described above.
From the experience of pLink1, PSM with E-value less than 1E-2 or 1E-3 is good. pLink2 uses SVM scores to estimate FDR, as SVM scores are flexible for different datasets, so there is no such a threshold for SVM scores. The Spectrum_Number >=2 or 3 might be a good indicator for a confident cross-linked site. The Spectrum_Number of one cross-linked site means how many PSMs supports the cross-linked site. It can be found in the *.filtered_cross-linked_sites.csv file.
As the unfiltered CSV contains unfiltered PSMs, it contains some additional columns:
-
Peptide_Type: the same as thePeptide_Typein spectra level described above, but with 0 for Regular/Common, 1 for Mono-Linked, 2 for Loop-Linked, and 3 for Cross-Linked. -
Refined_Score: the refined score calculated by KSDP algorithm. -
SVM_Score: the same as theScorein spectra level described above. -
Target_Decoy: the identification is target or decoy. 0 for Decoy-Decoy, 1 for Target-Decoy (or Decoy-Target), and 2 for Target-Target. -
Q-value: the smoothed FDR value. -
Protein_Type: the same as theProtein_Typein spectra level described above, but with 0 for Regular/Common, 1 for Intra-protein, and 2 for Inter-protein.
- Hardware requirement
- Software requirement
- pLink2 activation
- Quick start
- General description
- Web page result
- CSV result
- Parameter file
- Metadata configuration
- Mass spectrum labeling