Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aesthetics and clarifications for phosphoproteomics notebooks #8

Merged
merged 1 commit into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Markdowns/TMT_phospho.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Alongside the published total data shown above, we also performed a yeast/human


|Tag | Total | Phopsho | Ratio |
|------|-------|---------|-------|
|:----:|:-----:|:-------:|:-----:|
| 126 |1x |1x |1 |
| 127N |1x |1x |1 |
| 127C |1x |1x |1 |
Expand Down Expand Up @@ -214,7 +214,7 @@ psm_res <- list('Total'=psm.total, 'Phospho'=psm.phospho)


Below, we assess the quantification distributions. There are no clear outlier samples or any other concerns.
```{r}
```{r, warning=FALSE, fig.height=5, fig.width=7, fig.fullwidth=TRUE}
for(x in names(psm_res)){

p <- psm_res[[x]] %>% log(base=2) %>% plot_quant() +
Expand All @@ -237,7 +237,7 @@ We want to remove low Signal:Noise (S:N) PSMs, since the quantification values w

We use Note that where the signal:noise > 10, there are far fewer missing values.

```{r, fig.height=5, fig.width=7, fig.fullwidth=TRUE, fig.cap="Missing values per PSM, in relation to the signal:noise ratio"}
```{r, fig.height=5, fig.width=7, fig.fullwidth=TRUE}
for(x in names(psm_res)){

p <- psm_res[[x]] %>% plot_missing_SN(bins=40) +
Expand Down Expand Up @@ -269,7 +269,7 @@ psm_filt_sn_int <- psm_res %>% lapply(function(x){
```


Below, we will also filter our PSMs using the Delta.Score, e.g the difference between the score for the top two hits. Where this is the difference between the top score in the MS2 spectrum matching and the second top score. For a mixed species sample, it's likely some peptides will be difficult to assign to the protein from the correct species, so we are being extra cautious here. For a typical phospho TMT proteomics experiment, it would likely be reasonable to use a more relaxed filter above for the interference (e.g 50%) and to not filter by Delta Score at all.
Below, we will also filter our PSMs using the Delta.Score which is the difference between the top score in the MS2 spectrum matching and the second top score. For a mixed species sample, it's likely some peptides will be difficult to assign to the protein from the correct species, so we are being extra cautious here. For a typical TMT phosphoproteomics experiment, it would likely be reasonable to use a more relaxed filter for the interference in the code chunk above (e.g 50%)and to not filter by Delta Score at all.

```{r}

Expand Down
137 changes: 57 additions & 80 deletions Markdowns/TMT_phospho.html

Large diffs are not rendered by default.

14 changes: 11 additions & 3 deletions Markdowns/TMT_phospho_stats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ phospho_sites <- readRDS(here('results/phospho_sites.rds'))


Below, we identify the total peptides that overlap each phosphosite. Note that each phosphosite may intersect multiple total peptides (due to missed cleavages) and each total peptide may intersect multiple phosphosites (due to multiple phosphosites per peptide).
```{r}
```{r, warning=FALSE}
phospho_f <- phospho_sites %>% fData() %>% tibble::rownames_to_column('ID')
total_f <- total %>% fData() %>% tibble::rownames_to_column('ID')

Expand Down Expand Up @@ -105,7 +105,7 @@ Inspect the combined quantification matrix.
```{r}
head(all_combined_exprs) #
```

We also need to create feature data and phenotype data (experimental information) for our combined quantification matrix. From these, we can then create an MSnSet to hold all the combined quantification data.
```{r}
all_combined_fdata <- fData(phospho_sites[rownames(all_combined_exprs),])[
,c('Master.Protein.Accessions', 'ptm_position')]
Expand Down Expand Up @@ -136,7 +136,9 @@ pData(pairwise_comparison_combined_res)
Below, we perform the limma analysis. First though, some clarification on the model we're using.

We have two experimental variables:

- type: phospho or total

- condition: 1x or 2x spike in

```{r}
Expand All @@ -145,11 +147,17 @@ condition <- factor(pData(pairwise_comparison_combined_res)$condition, levels=1:
print(paste(type, condition))
```

When we use an model with the terms type, condition and type:condition, the interaction term captures the difference between the observed data and the simple additive model when the abundance is dependent upon just the separate effect of the type and condition variables and there is no combinatorial effect.
A simple additive model would estimate independent effects for the `type` and `condition` variables


```{r}
# model without interaction term
print(model.matrix(~type+condition))
```

If we want to explore a possible combinatorial effect, we need to also include a `type:condition` interaction term. In our case, this captures the difference in abundance for phosphopeptides between the conditions, which is specific for phosphopeptides and not seen in the unmodified peptides.

```{r}
# model with an interaction term
print(model.matrix(~type+condition+type:condition))

Expand Down
25 changes: 15 additions & 10 deletions Markdowns/TMT_phospho_stats.html
Original file line number Diff line number Diff line change
Expand Up @@ -496,9 +496,6 @@ <h3>Input data</h3>

phospho_peptide_to_total_peptides[[row$ID]] &lt;- total_peptide_ids
}</code></pre>
<pre><code>## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion</code></pre>
<p>Now, how many overlapping features. Note that for the phospho and
total to be considered intersecting, there must be at least one total
peptide which overlaps the phosphosite. Out of the 666 phosphosites,
Expand Down Expand Up @@ -572,6 +569,9 @@ <h3>Input data</h3>
## O60841_164 7.4 12.7
## O60841_214 63.7 56.5
## O75554_258|262 81.6 102.5</code></pre>
<p>We also need to create feature data and phenotype data (experimental
information) for our combined quantification matrix. From these, we can
then create an MSnSet to hold all the combined quantification data.</p>
<pre class="r"><code>all_combined_fdata &lt;- fData(phospho_sites[rownames(all_combined_exprs),])[
,c(&#39;Master.Protein.Accessions&#39;, &#39;ptm_position&#39;)]
all_combined_fdata$n_total &lt;- sapply(rownames(all_combined_exprs),
Expand Down Expand Up @@ -644,19 +644,19 @@ <h3>Input data</h3>
<h3>Run limma</h3>
<p>Below, we perform the limma analysis. First though, some
clarification on the model we’re using.</p>
<p>We have two experimental variables: - type: phospho or total -
condition: 1x or 2x spike in</p>
<p>We have two experimental variables:</p>
<ul>
<li><p>type: phospho or total</p></li>
<li><p>condition: 1x or 2x spike in</p></li>
</ul>
<pre class="r"><code>type &lt;- factor(pData(pairwise_comparison_combined_res)$type, level=c(&#39;total&#39;, &#39;phospho&#39;))
condition &lt;- factor(pData(pairwise_comparison_combined_res)$condition, levels=1:2)
print(paste(type, condition))</code></pre>
<pre><code>## [1] &quot;phospho 1&quot; &quot;phospho 1&quot; &quot;phospho 1&quot; &quot;phospho 1&quot; &quot;phospho 2&quot; &quot;phospho 2&quot;
## [7] &quot;phospho 2&quot; &quot;total 1&quot; &quot;total 1&quot; &quot;total 1&quot; &quot;total 1&quot; &quot;total 2&quot;
## [13] &quot;total 2&quot; &quot;total 2&quot;</code></pre>
<p>When we use an model with the terms type, condition and
type:condition, the interaction term captures the difference between the
observed data and the simple additive model when the abundance is
dependent upon just the separate effect of the type and condition
variables and there is no combinatorial effect.</p>
<p>A simple additive model would estimate independent effects for the
<code>type</code> and <code>condition</code> variables</p>
<pre class="r"><code># model without interaction term
print(model.matrix(~type+condition))</code></pre>
<pre><code>## (Intercept) typephospho condition2
Expand All @@ -682,6 +682,11 @@ <h3>Run limma</h3>
##
## attr(,&quot;contrasts&quot;)$condition
## [1] &quot;contr.treatment&quot;</code></pre>
<p>If we want to explore a possible combinatorial effect, we need to
also include a <code>type:condition</code> interaction term. In our
case, this captures the difference in abundance for phosphopeptides
between the conditions, which is specific for phosphopeptides and not
seen in the unmodified peptides.</p>
<pre class="r"><code># model with an interaction term
print(model.matrix(~type+condition+type:condition))</code></pre>
<pre><code>## (Intercept) typephospho condition2 typephospho:condition2
Expand Down
Loading