-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFigure3.Rmd
159 lines (120 loc) · 5.29 KB
/
Figure3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: "Ash manuscript Figure 3A & B"
output: html_notebook
---
```{r}
library(ggplot2)
library(ggbeeswarm)
library(ggpubr)
```
```{r}
phenotypes <- read_excel("~/Documents/Ash/Data_S1_phenotyping.xlsx")
```
```{r}
filtered_phenotype <- phenotypes[which(phenotypes$Type %in% c("Adult", "Juvenile") & !(is.na(phenotypes$GEBV))),]
replicates <- c("S31R", "S54R","S67R", "S76R", "S82R", "S223R", "S332Q", "S410Q", "S415R", "S571Q",
"S655R", "S868R", "S885R", "S895R", "S901Q", "S925Q", "S928Q", "S939Q", "S954R", "S987Q",
"SA", "SB", "SE", "SH", "SJ", "S394C")
filtered_phenotype <- filtered_phenotype[which(!(filtered_phenotype$Sample %in% replicates)),]
```
Violin Plot
```{r}
violin <- ggplot(filtered_phenotype, aes(x = Type, group = Type, y = as.numeric(GEBV), fill = Type)) +
geom_violin(trim = FALSE) +
scale_fill_manual(values = c( "Juvenile" = "skyblue1", "Adult" = "darkorange2")) +
geom_boxplot(outlier.shape = NA, width=0.2, fill = "white") +
scale_x_discrete(labels = c( "Juvenile" = "Juvenile (n=452)", "Adult" = "Adult (n=128)")) +
theme_minimal(base_size=15) +
theme(legend.position="none") +
labs(x = "",
y = "GEBV scores")
violin
```
```{r director, echo = FALSE}
# set working directory to the path to the files of allele frequencies and effect sizes
# if it is not the current directory, replace "." with the appropriate path
setwd(".")
set.seed(4242)
```
# Analysis of allele frequency change between adults and juveniles
To assess whether the change in GEBV can be attributed to selection rather than genetic drift we adopt two approaches that exploit the genotypes at unlinked sites (sites not used in the GEBV calculations).
Firstly we use these loci to calculate a matrix of relatedness among the pairs of plants. This matrix is then used to predict the GEBV of juveniles, from that of related adults. The regression between observed and predicted GEBV allows for genetic drift; i.e. the differential success the adults. The action of selection would be apparent in in intercept: a positive intercept would be expected with selection for higher GEBV. This effect would occur if the siblings of the surviving juveniles, which had lower GEBV had succumbed to the fungus.
The second comparison is of the change in allele frequency between adults and juveniles as measured by an F statistic. Any additional changes in allele frequency due to selection would be expected to produce a larger difference between the generations at the GEBV loci, particularly those loci with a larger effect size.
```{r read_data, echo = FALSE}
rdat <- read.csv("unlinked_sites.csv")
# get matrix of unlinked loci (+1 to get -1/0/1 format)
ngmat <- 1 + as.matrix(rdat[,-(1:3)])
# Which observations are Adult and Juvenile
adults <- rdat$Age == "Adult"
juveniles <- rdat$Age == "Juv"
# Calculate allele frequencies for Adults and Juveniles
afreq <- colMeans(ngmat[adults,] + 1)/2
jfreq <- colMeans(ngmat[juveniles,] + 1)/2
```
# Calculating the predicted GEBV of the juveniles
The additive relationship matrix A is calculated from the genotypes at the unlinked loci (ngmat), which is then used to calculate the expected GEBV of the offspring.
The regression of observed and expected GEBV is highly significant, and has a highly significant positive intercept, consistent with selection in favour of high GEBV scores.
```{r predict, echo = TRUE}
library(rrBLUP)
library(MASS)
# set number of iterations for F null distribution calculations (e.g. 1e4 for higher accuracy)
nits <- 3e3
# Calculate the relationship matrix using the rrBLUP function ngmat
A <- A.mat(ngmat)
# Standardize breeding values to have mean of zero in adults
bvs <- rdat$GEBV - mean(rdat$GEBV[adults])
sigma_adult <- A[adults,adults]
bv_adult <- bvs[adults]
bv_juv <- bvs[juveniles]
# Calculate predictions using the MASS function solve
predn <- t(A[adults,juveniles]) %*% solve(sigma_adult) %*% bv_adult
```
ggplot version of plot
```{r}
library(ggplot2)
data <- data.frame(predn, bv_juv)
plot <- ggplot(data, aes(x = predn, y = bv_juv)) +
geom_point(alpha=0.5) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
labs(x = "Juvenile GEBV scores predicted from ancestry", y = "GEBV scores") +
geom_smooth(method = "lm", col = "blue", se = FALSE) +
theme_minimal(base_size=15)
plot
```
Plot with error bars
```{r}
library(ggplot2)
data <- data.frame(predn, bv_juv)
plot <- ggplot(data, aes(x = predn, y = bv_juv)) +
geom_point(alpha=0.5) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
labs(x = "Juvenile GEBV scores predicted from ancestry", y = "GEBV scores") +
geom_smooth(method = "lm", col = "blue", se = TRUE, fill = "blue") +
theme_minimal(base_size=15)
plot
```
Combine plots
```{r}
library(ggpubr)
fig3 <- ggarrange(violin, plot,
labels = c("A", "B"),
ncol = 1, nrow = 2)
fig3
```
```{r}
#Save output to file
tiff("~/Documents/Ash/Figures/Fig3.tiff", units="mm", width=180, height=200, res=300)
fig3
dev.off()
png("~/Documents/Ash/Figures/Fig3.png", units="mm", width=180, height=200, res=300)
fig3
dev.off()
jpeg("~/Documents/Ash/Figures/Fig3.jpg", units="mm", width=180, height=200, res=300)
fig3
dev.off()
ggsave("~/Documents/Ash/Figures/Fig3.svg", plot = fig3, width = 180 / 25.4, height = 200 / 25.4)
```
```{r}
mod <- lm(data$predn ~ 1, offset = data$bv_juv)
summary(mod)
```