-
Notifications
You must be signed in to change notification settings - Fork 0
/
MQ_data_prep_steps.txt
90 lines (77 loc) · 4.15 KB
/
MQ_data_prep_steps.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
proteinGroups.txt prep for mouse lens data
x Open Excel with blank sheet
x Use menues or other mechanism in Excel to open proteinGroups.txt file
x We want the Text Import Wizard to activate
x Click "next" on first two steps (delimited and tab as separator)
x In third step, make sure any columns with gene names are text format (7th column)
x After import, save file as XLSX format (and name something more distinct)
x Add a spacer column at the far right of the sheet (Col. FN) and fill with "blank"
x Add a "flag" column (FO) to denote rejected proteins (decoys, contams, missing data)
x Populate the "flag" column with "z" (later we will sort descending)
x Add 4 extra rows at top of sheet (so headers are in Row 5)
x Make header row "bold"
x Allow header cells to wrap text (cell formatting)
x Save sheet (save often)
x Select Row 5 and turn on column filters
x Show "+" for Col FC ("Reverse") to show decoys
x Replace "z" in Flag column (FO) with "decoy" and fill down
x Reset column filter on FC
x Show "+" for Col FD ("Potential contaminant") to show contaminants
x Replace "z" in Flag column (FO) with "contam" and fill down
x Reset column filter on FD
x Show any rows with "keratin" in "Protein names" (Col F)
x Flag any additional keratins as "contams" in Flag column
x Look for "albumin" and also flag as contam
x Clear filter on Col. F
x Add a "Counter" column (FP)
x fill column with the number "1"
x make sure the values stop at the end of the table (row 6364)
x enter formula: "=SUBTOTAL(109,FP6:FP6501)" in cell FP4
x Sort descending on Flag column (FO)
x Insert blank row at bottom of table between contams and decoys and other proteins
["mouse_lens_3TMTs_proteinGroups_int1.xlsx" is save at this point]
x Select Row 5 and turn off column filters
x Locate the reporter ion columns of interest (CL:DO) and add sample key in Row 4 above
x Do not use any "corrected" reporter ion columns
x There are experiment-wide columns (Cols AD-AW)
x There are columns for each experimental sample (cols BH-DO)
x We want columns CL through DO
x Add sample key above reporter channels (they are in increasing mass order)
x The first 6 channels were used for the time point (in increasing order)
x Note that N forms are lighter than C forms
x An alphabetical ordering is not the same as a mass-based ordering
x We need to get columns for the same condition together for counting of missing data
x We will need 4 columns for counters (one for each TMT experiment and a combined col.)
x Copy the Exp. 1 data (CL:CQ) to columns FU:FZ
x Copy the Exp. 2 data (CV:DA) to columns GA:GF
x Copy the Exp. 3 data (DF:DK) to columns GG:GL
x Replace the column names (row 5) with sample names
x Label column FQ as "keeper"
x Label column FR "missing_1"
x Label column FS "missing_2"
x Label column FT "missing_3"
x Fill FQ with this formula: "=COUNTIF(FR6:FT6,"=ok")"
x Fill FR with this formula: "=IF(COUNTIF(FU6:FZ6, ">1")>=5, "ok", "missing")"
x Fill FS with this formula: "=IF(COUNTIF(GA6:GF6, ">1")>=5, "ok", "missing")"
x Fill FT with this formula: "=IF(COUNTIF(GG6:GL6, ">1")>=5, "ok", "missing")"
x In these discovery experiments, we want to cast a wide net
x We want to allow a maximum of one missed reporter ion in each experiment
["mouse_lens_3TMTs_proteinGroups_int2.xlsx" is save at this point]
x Select Row 5 and turn column filters back on
x Sort column FQ descending (we will keep rows with a value of "3")
x Insert a blank row when count goes from "3" to "2" (rows 3873 and 3874)
x Replace "z" in Flag column (FO) with "missing data" for rows 3875:6222
x Copy cells B5:B3873 (Majority protein IDs) to cells GM5:GM3873
x Rename column header to "Accession" (cell GM5)
x [Optional:
x Convert text-to-columns as delimited with semicolon checked
x Select a bunch of columns (eg. GN to HT) and clear contents]
x Add a new tab
x Rename tab to "MQ_prepped_data"
x Copy cells FU5:GM3873 to new tab (paste in at cell A1)
x Save workbook [a "mouse_lens_3TMTs_protenGroups_int3.xlsx" file was created here]
x Go to "MQ_prepped_data" tab
x Save As "MQ_prepped_data.csv" in CSV format (save just the selected tab)
x Excel may give you several warnings
x Quit Excel
Zero replacements and data normalizations will be done in R