10-spatial-estimation.Rmd

``` {r 10-load-packages-and-setup, echo = FALSE, include = FALSE, warning=FALSE}
library(broom)
library(cowplot)
library(dismo)
library(dplyr)
library(ggplot2)
library(ggpubr)
library(ggsn)
library(ggspatial)
library(gridExtra)
library(gstat)
library(knitr)
library(leaflet)
library(mapproj)
library(maptools)
library(plotly)
library(raster)
library(RColorBrewer)
library(rgl)
library(scales)
library(sf)
library(sp)
library(spatialreg)
library(spatstat)
library(spdep)
library(tmap)
knitr::knit_hooks$set(webgl = hook_webgl)
```

<!--- ```{r echo=FALSE}
yml_content <- yaml::read_yaml("chapterauthors.yml")
author <- yml_content[["spatialEstimation"]][["author"]]
coauthor <- yml_content[["spatialEstimation"]][["coauthor"]]
```
# Spatial Estimation {#spatial-estimation}

Written by
```{r results='asis', echo=FALSE}
cat(author, "and", coauthor)
```

**Geostatistics** uses the metrics based on statistical tools to characterize the distribution of a random variable across a geographical region of interest [@getis_spatial_2004]. Like any other data, spatial data may be incomplete, which can limit our analytical capacity in the incomplete regions. In context of spatial data, we wish to understand spatial phenomena like events or processes that occur over large areas, but usually we are limited in our capacity to sample everywhere. Most of the time, only discrete and unrepresentative information on the spatial phenomenon is collected, which does not allow us to create a precise and continuous map of the phenomenon over entire geographic area of interest. How can we best utilize the available samples to represent the phenomenon across the entire geographic area? This chapter introduces some basic ideas on different types of sampling strategies in spatial context. In addition, this chapter also introduces various type of spatial statistics that are available to predict the occurrence of spatial phenomena in unsampled locations.

:::: {.box-content .learning-objectives-content}

::: {.box-title .learning-objectives-top}
## Learning Objectives {-}
::: 

1. Differentiate the advantages of spatial sampling strategies
2. Describe the relationship between observations at different spatial scales using autocorrelation and semivariogram
3. Apply methods of spatial interpolation to predict observations at unknown locations
4. Apply methods of spatial prediction using regression models

::::

## Key Terms {-}

Spatial autocorrelation, 

## Spatial Autocorrelation

A key characteristic that distinguishes spatial data from other types of data is the fact that spatial phenomena are frequently spatially autocorrelated. **Spatial autocorrelation** is the relationship between a variable of interest with itself when measured at different locations [@cliff_ad_and_ord_spatial_1973]. Any two samples are more likely to be correlated or similar to each other when the samples are closer in space compared with two samples that are taken at farther distances. For example, if we measured air temperature at our current location and then walked 1 m away and measured it again, then walked 10 km and measured it again, we would expect that the temperatures sampled 1 m apart would be more similar than the temperatures measured 10 km apart. The precise degree and sign (positive or negative) of spatial autocorrelation will vary from phenomenon to phenomenon because some phenomena change more quickly over space than others. As far as we can tell, all spatial phenomenon exhibit spatial autocorrelation at some, but not all distances. In other words, we have never observed a natural spatial phenomenon that is truly random. Spatial autocorrelation provides a tremendous advantage for estimating statistics and characteristics of populations in space, but also introduces several important caveats.

Consider the following example; **Figure \@ref(fig:10-cluster-sampling)** shows the clustering pattern in the given square boxes (can be variable of interest) representing the positive spatial autocorrelation (left) and a complete checkerboard (right) distribution of square boxes (variable of interest) indicating a negative spatial autocorrelation.

:::: {.box-content .call-out-content}

::: {.box-title .call-out-top}
## Recall This {-}
:::

<p id="box-text"> 

## Domain
The study area from where spatial sample is taken. 

## Attributes
The information attached to the study objects that are spatially distributed in a Domain. Often termed as variable of interest.

</p>

::::

```{r 10-spatial-autocorrelation, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Example of a positive (left) and negative spatial autocorrelation (right) for a give domain. Nepal, CC-BY-SA-4.0.")

n<- matrix(c(1:400), nrow=20, ncol=20)

# Create row indicator
df <- expand.grid(x=1:ncol(n),y=1:nrow(n))
df$val <- n[as.matrix(df[c('y','x')])]
df$val<-rep(c(1:5), each = 80)
# Subset odd rows
cols <- c("1" = "red", "2" = "blue", "3" = "darkgreen", "4" = "orange", "5"="black")
p <- ggplot(df)+ geom_tile(aes(x, y, fill =val), 
                            colour = "black", width=1, height=1, size=1)+

  scale_fill_gradient2(low=muted("blue"), high=muted("red"))+
  scale_y_reverse() +
  theme_classic() + 
  theme(legend.position = "none")+
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
############# negative#########
m <- matrix(c(1:625), nrow=25, ncol=25)
df3 <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df3$val <- m[as.matrix(df3[c('y','x')])]
odd_indexes<-seq(1,nrow(df3),2)
even_indexes<-seq(2,nrow(df3),2)
df3[odd_indexes, "val"] <- "0"
df3[even_indexes, "val"] <- "1"

q <- ggplot(df3)+ geom_tile(aes(x, y, fill = ifelse(val > 0,val, NA)), 
                   colour = "black", width=1, height=1, size=1)+
#n <- df3[sample(nrow(df3), 40, replace = FALSE, prob = NULL),]

  scale_y_reverse() +
  theme_classic() + 
  theme(legend.position = "none")+
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())

grids_bs <- plot_grid(p,q,ncol = 2, labels = c('Positive autocorrelation', 'Negative autocorrelation'), align = "h")
grids_bs
```

Example of a positive (left) and negative spatial autocorrelation (right) for a give domain.

## Moran's I 

**Moran’s I** [@moran_notes_1950], is a correlation coefficient that measures the degree of spatial autocorrelation in certain attributes of the data. It is based on the spatial covariance standardized by the variance of the data [@moran_notes_1950]. It works based on the neighborhood list created based on spatial weight matrix [@suryowati_comparison_2018]. The value of Moran’s I ranges between -1 to 1, where 1 indicates the perfect positive spatial autocorrelation, 0 indicates the random pattern, and -1 indicates the perfect negative autocorrelation [@moran_notes_1950]. Moran’s I is calculated using the following formula [@moran_notes_1950]:  

$$
I= \frac{1}{s^2} \frac{\sum_{i}\sum_{j}({y_i-\bar{y})({y_j-\bar{y}})}}
{\sum_{i}\sum_{j}w_{ij}}
$$
Where, $$I=\text{the Moran I statistics}$$, $$y_i=\text{variable measure at location i}$$
$$y_j=\text{variable measure at location j}$$

$$S^2=\text{the variance}$$
$$w_{ij}=\text{the spatial weight matrix}$$ 

## Case Study 1

For this case study, we will use ground plot data from Change Monitoring Inventory (CMI) program [for details: @province_of_bc_provincial_2018] for Williams Lake and 100-miles House timber supply area (TSA) in the province of British Columbia, Canada. William Lake TSA and 100-miles House TSA are divided into 18 and 8 blocks respectively **Figure \@ref(fig:10-spatial-autocorrelation)**. There is a total of 456 CMI plots used in this study **Figure \@ref(fig:10-spatial-autocorrelation)**. The total basal area (m2/ha) is our variable of interest in this study. For each of the polygon in Williams lake and 100-miles house TSA, total basal area was calculated by taking the sum of the basal area of each CMI plots in each polygon. For this part of exercise, we want to understand if there is any spatial relationship (autocorrelation) between the total basal area measured in each polygon of TSA. We will quantitatively measure the presence or absence of **spatial autocorrelation** using **Moran’s I** and **Geary's C**. 

```{r 10-morans-I-load-data-and-setup, echo=F, warning=F, message=F}
options(stringsAsFactors = FALSE)
filename <- readOGR(dsn="data/10",layer="Block_basa_area")
plots<- read.csv("data/10/CMI.csv",header=T)
```

```{r 10-morans-I, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("CMI plots location for williams lake TSA and 100 miles house TSA. Basal area has been calculated using the sum of the basal area for each plot (dot) over the given polygon. Nepal, CC-BY-SA-4.0.")
###Morans'I using spdep'#########################################

####################### Plot the data #######################
# note that you don't need to call maptools to run the code below but it needs to be installed.
# to add a north arrow and a scale bar to the map libraries: ggsn and mapproj
# set factors to false


### Convert the spatial data into ata frame ##########
filename_df <- tidy(filename)
# make sure the shapefile attribute table has an id column
filename$id <- rownames(filename@data)

# join the attribute table from the spatial object to the new data frame
filename_df <- left_join(filename_df,
                         filename@data,
                         by = "id")

will<- readOGR(dsn="data/10",layer="100_and_will")

will_df <- tidy(will)
# make sure the shapefile attribute table has an id column
will$id <- rownames(will@data)

# join the attribute table from the spatial object to the new data frame
will_df <- left_join(will_df,
                         will@data,
                         by = "id")

###################### Bringing the CMI plots over williams lake block and 100 miles block

##################### Creating the map using ggplot###################

t<-ggplot() +
  geom_polygon(data = filename_df,
               aes(x = long, y = lat, group = group,
                   fill =Basal ),color="black")+
  geom_polygon(data = will_df,
               aes(x = long, y = lat, group = group),alpha=0.005,color="red")+
  geom_label(aes(x=400000,
                 y=5750000,label="Williams lake TSA"),
             label.padding = unit(0.55, "lines"), # Rectangle size around label
             label.size = 0.35,
             color = "black",
             fill="#69b3a2")+
  geom_label(aes(x=620000,
                 y=5700000,label="100 miles house TSA"),
             label.padding = unit(0.55, "lines"), # Rectangle size around label
             label.size = 0.35,
             color = "black",
             fill="#69b3a2")+
  geom_point(data=plots, aes(utm_eastin,utm_northi), inherit.aes = FALSE, 
             alpha = 0.5, size =1.5) + coord_equal()+
  scale_fill_continuous(type="viridis",option="E")+
  ggsn::north(filename_df, scale =0.15, location = "bottomleft")+
  annotation_scale(line_width = 1.5,
                   height = unit(0.3, "cm"),text_cex =1.5)+
  theme_bw()+  
  labs(x = "", y = "")+
  theme(axis.text = element_blank(), axis.ticks = element_blank())+
  guides(alpha=F)+
  labs(fill="Basal_area")+
  theme(legend.position = c(0.93,0.10), legend.direction = "vertical")+
  theme(legend.key = element_rect(fill = "white", colour = "white"))+
  theme(legend.background = element_rect(fill="grey",
                                         size=0.6, linetype="solid", 
                                         colour ="white"))+
  theme(legend.key.size = unit(0.3, "cm"))+
  theme(legend.text=element_text(size=10),
        legend.title=element_text(size=10))
t
```

CMI plots location for williams lake TSA and 100 miles house TSA. Basal area has been calculated using the sum of the basal area for each plot (dot) over the given polygon.
:::

## Calculating Moran's I {-}

We will calculate the Moran's I for the basal area variable pertaining to the polygon. 


## Using Contiguity {-}

**Define Neighborhood**

The Moran’s I statistic is the correlation coefficient for the relationship between a variable and its surrounding values. But before we go about computing this correlation, we need to come up with a way to define a neighborhood. There are two ways to define neighborhood namely; contiguity for spatial **polygon data** and distance-based approach for the spatial **point data** and polygon data both. For polygon data, contiguity based neighborhood selection can be adopted using two widely used method, respectively known as ***Rook's case*** or ***Queen's case*** **Figure \@ref(fig:10-morans-I)**.

```{r 10-using-contiguity, echo=FALSE, Message=FALSE, Warinig=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Rook's (left) and Queen's (right) case for searching the neighborhood (grey unit) for the darker unit in the center. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:9), nrow=3, ncol=3)
df3 <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df3$val <- m[as.matrix(df3[c('y','x')])]
p<-ggplot(df3,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df3, fill='transparent', colour = 'black') + 
  geom_rect(aes(xmin =1.5,xmax = 2.5,ymin = 1.5, ymax = 2.5),
            fill = 'black',alpha=0.5,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 1.5,ymin = 2.5, ymax = 3.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =2.5,xmax = 3.5,ymin = 2.5, ymax = 3.5),
           fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 1.5,ymin = 0.5, ymax = 1.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =2.5,xmax = 3.5,ymin = 0.5, ymax = 1.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())

q<-ggplot(df3,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df3, fill='transparent', colour = 'black') + 
  geom_rect(aes(xmin =1.5,xmax = 2.5,ymin = 1.5, ymax = 2.5),
            fill = 'black',alpha=0.5,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 1.5,ymin = 2.5, ymax = 3.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =2.5,xmax = 3.5,ymin = 2.5, ymax = 3.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 1.5,ymin = 0.5, ymax = 1.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =2.5,xmax = 3.5,ymin = 0.5, ymax = 1.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 1.5,ymin = 1.5, ymax = 2.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =2.5,xmax = 3.5,ymin = 1.5, ymax = 2.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =1.5,xmax = 2.5,ymin = 2.5, ymax = 3.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =1.5,xmax = 2.5,ymin = 0.5, ymax = 1.5),
            fill = 'black',alpha=0.02,color = "black",size = 2)+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
grids_bs <- plot_grid(p,q,ncol = 2, labels = c('Rooks contiguity', 'Queens contiguity'), align = "h")
grids_bs
```

Rook's (left) and Queen's (right) case for searching the neighborhood (grey unit) for the darker unit in the center.

**Step 1: Build Neighborhood** 

Since we are working with the polygons, we will use the queen's contiguity to build the neighborhood list using **polyn2b** function in **spdep** package in R and plot the the linkage**

```{r 10-using-contiguity-2, warning=FALSE, message=FALSE}
####################### Plot the data #######################
### Convert the spatial data into data frame ##########

filename_df <- tidy(filename)
# make sure the shapefile attribute table has an id column
filename$id <- rownames(filename@data)

# join the attribute table from the spatial object to the new data frame
filename_df <- left_join(filename_df,
                         filename@data,
                         by = "id")

#Searching neighborhood
w1 <- poly2nb(filename,row.names=filename$id,queen=T) ###### queens case
coords <- coordinates(filename)
plot(w1, coords, col="grey")
```

**Step 2: Getting a Spatial Weight Matrix for Neighborhood List**

```{r 10-using-contiguity-3, warning=FALSE, message=FALSE}
ww <- nb2listw(w1,style='B')
```

**Step 3: Calculate Moran's Correlation Coefficient Using the Spatial Weight Matrix for Neighbors**

```{r 10-using-contiguity-4, warning=FALSE, message=FALSE}
## calculating Moran's I 
moran(filename$Basal, ww, n=length(ww$neighbours), S0=Szero(ww))
```

**Step 4: Conduct the Significance Test for the Calculated Moran's I Value**
```{r 10-using-contiguity-5, warning=FALSE, message=FALSE}
moran.test(filename$Basal, ww)
```

**Using the K-Nearest Neighborhood Imputation**

**Step 1: We will select 3 Nearest Neighbor Using Distance-based Approach**

The function knearneigh in 'spdep" package in R and build a spatial weight 

```{r 10-using-contiguity-6, warning=FALSE, message=FALSE}
# Searching neighborhood
col.knn <- knearneigh(coords, k=3)
w<-knn2nb(col.knn,row.names = filename$id)

coords <- coordinates(filename)
plot(w, coords, col="grey")
```
**Step 2: Build a Spatial Weight Matrix for the Neighborhood List**

```{r 10-using-contiguity-7, warning=FALSE, message=FALSE}
#spatial weight 
ww1 <- nb2listw(w,style='B')
#ww1
```

**Step 3: Calculate the Moran's I Coefficient** 

```{r 10-using-contiguity-8, warning=FALSE, message=FALSE}
## Calculating Moran's I 
moran(filename$Basal, ww1, n=length(ww1$neighbours), S0=Szero(ww1))
```

**Step 4: Significance Test for Calculated Moran's I**

```{r 10-using-contiguity-9, warning=FALSE, message=FALSE}
moran.test(filename$Basal, ww1)
```

Note that the value of Moran's I changed based on how we calculated the neighborhood list using two different approach. The interpretation change here based on the way we created the neighborhood. With contiguity based neighbor, we found a negative value for I (-0.11), indicating a negative weak spatial autocorrelation. When we run the significance test we can see that the p-value < 0.05 indicating the autocorrealtion is not significant. While using nearest neighbor we found that I (0.017) indicated a weak positive spatial autocorrelation. One reason for the difference is k-nearest neighbor uses polygon within a greater distance and can include more polygons as compared to contiguous neighbor which uses either "queens" or "rooks" contiguity [@suryowati_comparison_2018].

## Geary's C 

Another more local measure of spatial autocorrelation unlike Moran's I is **Geary's C** [@geary_contiguity_1954]. While Moran's I is calculated by standardizing the spatial autocovariance by the variance of the data. Geary's c on the other had uses the sum of the squared differences between pairs of data values as it is a measure of covariance [@geary_contiguity_1954]. However, both statistics depends on the spatial nature of data and are based on neighborhood. Both of these statistics depend on a spatial structure specified by a spatial weights matrix. The value of Geary's C ranges between 0 to some unknown positive value, where 0 indicates the spatial randomness, values less than 1 indicates the positive spatial autocorrelation, while value greater than 1 indicates negative spatial autocorrelation [@geary_contiguity_1954]. It is calculated using following formula:
$$ C=\frac{(n-1) \sum_{i}^n\sum_{j}^nw_{ij}(y_i-y_j)^2}{2\sum_{i}^n\sum_{j}^nw_{ij}\sum_{i}(y_{i}-\bar{y})^2}$$

**Step 1 and 2: We will calculate all the neighborhood list exactly how we did for Moran's I and get our spatial weight matrix**

**Step 2: In our final step, we will use geary funtion from "spdep" package to calculate the value of Geary's** 

***For Queens Case***

```{r 10-gearys-c, warning=FALSE, message=FALSE}
## Geary C 
geary(filename$Basal, ww, n=length(ww$neighbours),n1=length(ww$neighbours)-1, S0=Szero(ww))
## Significance test for Geary C
geary.test(filename$Basal, ww)
```
***For Nearest Neighbour Method***

```{r 10-gearys-c-2, warning=FALSE, message=FALSE}
## Geary C
geary(filename$Basal, ww1, n=length(ww1$neighbours),n1=length(ww1$neighbours)-1, S0=Szero(ww1))
## Significance test for Geary C
geary.test(filename$Basal, ww1)
```
Note that the value of Geary's C indicated a positive spatial autocorrelation using both queens case and k-nearest neighbor. However, the spatial autocorrelation was not significant as given by p-value < 0.05. Both Moran's I and Geary's C are in agreement in terms of results.

## Populations, Samples and Statistics

Suppose you are interested in measuring the heights of trees in a forest. There are a lot of trees in this forest and you cannot measure all of them. So you cleverly decide to sample some locations and infer the mean tree height of the forest from this sample. In this example, the number of trees that you measured are the __sample__ $n$, the forest is the total __population__ of trees $N$, and the __statistic__ that you are interested in estimating is the mean tree height of the forest: 

$$
x̄ = \frac{1}{n} \sum_{i=1}^{n} x_i
$$
Lowercase $x$ represents any sampled tree and $x̄$ represents the mean of the __sample__, which is an __estimate__ of the __population__ mean. It is important to recognize that statistics of samples and populations are discussed and represented differently. The true mean tree height of the forest is denoted by the Greek lowercase mu $μ$, and we can never know this value without measuring every tree. So instead, we __infer__ the population statistic using the sampled data and hope that the sample statistic will be close to the true population statistic, but we must accept that they are rarely equivalent $x̄≠μ$. The same equation above can be re-written for the population mean:

$$
μ = \frac{1}{N} \sum_{i=1}^{N} X_i
$$

Notice here we are using $μ$ to signify the population mean tree height, uppercase $N$ signifies all the trees in the population, and uppercase $X_i$ is one tree in the population. At this point, you might take a moment to appreciate that as our sample $n$ approaches $N$, our sample mean $x̄$ should also approach the population mean $μ$. In other words, the more trees we sample, the more likely we are going to have an accurate estimate of $μ$. It might surprise you that this conclusion is likely true for large samples $n$, but not small samples $n$. Why? Because any randomly-selected tree from our forest $n=1$ could be much taller or shorter than the mean just due to random chance alone. In fact, if our forest has two distinct cohorts of trees, one very young emerging below the canopy and one established in the overstory (i.e., bi-modal distribution), then the population mean might never be approximated by any particular tree. This is problematic for us, because we want to accurately estimate the population statistic whilst exhausting as few of our resources as possible to do so (i.e., time, people, equipment, budget).

When we made our measures of tree heights, it would be reasonable for us to expect that the tree heights that we sampled are going to be different (i.e., most trees will not have the same height). The magnitude of those differences in the sample are captured by the **variance**, which is a measure of the dispersion of our tree heights $x$ relative to the sample mean $x̄$:

$$
s^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i-x̄)^2
$$
We square the differences so that the variance is always positively signed and consequently variance always has squared units, hence $s^2$. If we take the square root of the variance, we have the **standard deviation**:

$$
s = \sqrt{s^2}
$$
The population standard deviation is represented by Greek lowercase sigma $σ$ and population variance is $σ^2$.

## Significance Testing

The significance level of a test statistic from the sample is compared with the critical value of the test statistic obtained by repeatedly reordering the data with random proportions [@jacquez_spatial_1999]. Interpretation of this significance level in classical statistics is done using a **p-value** which, if less than or equal to a defined threshold, also know as **alpha α** (usually 5% or 0.05), then the **null hypothesis** of "no difference from random" is rejected [@jacquez_spatial_1999]. In other words, when the p-value is less than or equal to 0.05, we conclude that the statistic we observe for the sample is unlikely to have occurred due to random chance alone. This interpretation is important because recall that if our sample is small, there is a random chance that any statistic we calculate from it could be due to that particular sample rather than represent a true characteristic of the larger population. For example, imagine the case where we randomly sample some trees, but by chance we happen to only sample young trees and so our mean tree height from the sample is not representative of the population. The p-value represents the probability that our statistic from the sample is observed due to random chance, so we want this value to be as low as possible.

While $α=0.05$ is widely used, there are cases where we may set a higher standard. For example, where the risk of incorrectly rejecting the null hypothesis, known as a "false positive" or **Type I error**, could have significant consequences on human health or safety. Consider the case where a new drug is being tested for efficacy and the p-value in the clinical trial is 0.05, so the null hypothesis that placebo is better than the drug is rejected. But there is a 5% probability that there is no difference between the new drug and a placebo. The implication is that 5% of the human population who take the new drug will not experience any benefits. If the population eligible for this drug is large, this can translate to many people being impacted by this decision. You can see how this can be problematic because there might be alternative therapies or drugs that are more effective than this new drug that the population are not utilizing. A similar case might exist where we believe that the null hypothesis is true, but it is in fact false, known as a "false negative" or **Type II error**. In these cases, we want to be sure that our result is not spurious, so the p-value standard may be raised to 0.01 or even 0.001, so that only one person in a hundred or one person in a thousand will experience no difference from the placebo.

## Classical Statistics vs. Geostatistics

In classical statistics, the variance is assumed to be totally random when samples are drawn from a defined population [@jacquez_spatial_1999] and are assumed to come from one distribution [@steel_principles_1980]. Inferences about the population can be made by comparing a statistic calculated for the sample to the distribution of that statistic under the null hypothesis for the assumed distribution [@jacquez_spatial_1999].

In geostatistics, the variance of a spatial phenomenon is assumed to be partly random and each point in the field represents a sample from some distribution [@jacquez_spatial_1999]. However, the distribution at any one point may differ completely from all other points in its shape, mean, and variance. The distribution of differences in sample values separated by a specified distance is assumed to be the same over the entire field. Sample values that exhibit spatial autocorrelation due to their proximity to each other will have relatively small random variance of the distribution of differences. If the sample values do not exhibit spatial autocorrelation, then the variance is larger. The **semivariance** or half of the variance is used to measure the similarity between points at a given distance apart. The distance at which we compare the semivariance of sample values is known as the **lag distance**. If we graph semivariance and lag distance, we have created a **semivariogram**. Since we are usually dealing with samples and not the full population, we will almost never know the exact semi-variogram, instead we must rely on fitting models to __estimate__ this relationship.

Thus, geostatistics is distinguished from classical statistics by the fact that we need to estimate the semivariogram function and then incorporate the semivariogram function to estimate the values at unsampled locations using various predictive spatial methods.

## Semivariogram Modeling 

The semivariogram is a basic geostatistical tool for measuring spatial autocorrelation of a variable measured at different spatial location. A semivariogram is a measure of variance of a variable between two locations separated by certain lag distance [@isaaks_introduction_1989]. For example, we can measure how a variable $y$ changes in value between sites $i$ and $j$ by calculating the variance between the sites $y_i-y_j$. If the surface represented by the two points is continuous and the lag is a small distance, we would expect the variance to be small [@isaaks_introduction_1989]. With increasing lag distance, we would expect the variance to increase. Let us translate this intuitive statement into a equation known as the empirical variogram:

$$\gamma{(h)}=\frac{1}{2N}\sum_{i,j=1}^{N(h)}{({y_i}-y_j)}^2$$ 

Where, $$\gamma{h}=\text{semivariance at a spatial lag h}$$
$$ i=\text{measure spatial coordinate (latitude/UTM easting)}$$
$$ j=\text{measure spatial coordinate (longitude/UTM northing)}$$
$$y_{i}=\text{measured value of variable of interest at the spatial location i} $$
$$y_{j}=\text{measured value of variable of interest at the spatial location j} $$
$$N=\text{number of sampled differences or lag} $$

Like the familiar variance, it is a sum of squared differences divided by the number $N$ of sampled differences. Unlike simple variance about a mean that we discussed earlier, the semivariogram measures difference between two sample values. The __semi__ in semivariogram comes from the fact that the variance is divided in half.

A semivariogram is a graph that consists of semivariance on the y-axis and a lag distance on the x-axis Figure \@ref(fig:10-semivariogram-modelling). There are some important features of a semivariogram that can be used to interpret the nature and structure of spatial autocorrelation.

```{r 10-semivariogram-modelling, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
## An example semivariogram with all the components using the "Fulmar" [see details @pebesma_mapping_2005] data from "gstat" package in R.

fig_cap <- paste0("An example semivariogram with all the components using the 'Fulmar'data from 'gstat' package in R. Nepal, CC-BY-SA-4.0.")
data(fulmar)
fulmar.spdf <- SpatialPointsDataFrame(cbind(fulmar$x,fulmar$y),
                                      fulmar)
fulmar.spdf <- fulmar.spdf[fulmar.spdf$year==1999,]
proj4string(fulmar.spdf) <- CRS("+init=epsg:32631")
evgm <- variogram(fulmar~1,fulmar.spdf,
                  boundaries=seq(0,250000,l=51))
fvgm <- fit.variogram(evgm,vgm(3,"Sph",100000,1))
preds = variogramLine(fvgm, maxdist = max(evgm$dist))

g<-ggplot(evgm,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds)+
  geom_segment(aes(x=110000,y=0,xend=110000,yend=13.1)) +
  geom_segment(aes(x = 0, y = 1.8, xend = 110000, yend = 1.8))+
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = 1.8))+
  geom_label(aes(x=50000,
                 y=2,label="Range"),
             label.padding = unit(0.55, "lines"), # Rectangle size around label
             label.size = 0.35,
             color = "black",
             fill="#69b3a2")+
  geom_label(aes(x=110000,
                 y=7.5,label="Sill"),
             label.padding = unit(0.55, "lines"), # Rectangle size around label
             label.size = 0.35,
             color = "black",
             fill="#69b3a2")+
  geom_label(aes(x=0,
                 y=0.8,label="Nugget"),
             label.padding = unit(0.20, "lines"), # Rectangle size around label
             label.size = 0.15,
             color = "black",
             fill="#69b3a2")+
  labs(x = "lag distance (h)", y = "Semi-variance")+
  theme_bw()
g
```

**Range $a$:** The distance at which a variogram model first flattens out. This is the distance up to which the variable is considered to be spatially autocorrelated (Figure \@ref(fig:10-semivariogram-modelling)). For variogram models that are bounded and have sills, the range is generally accepted to be 95% of the sill.

**Nugget $c_0$:** Nugget refers to unaccounted autocorrelation due to a smaller lag distance than sampling distance or due to errors and imprecision arising from sampling (Figure \@ref(fig:10-semivariogram-modelling)). The nugget is also where the variance function $\gamma{(h)}$ intercepts the y-axis. Without the nugget, we would expect all variogram models to evaluate to zero variance at zero lag $\gamma{(0)}=0$, and conceptually this makes sense because how can you have variance between any observation and itself?

**Sill $s$:** The value of variance that a variogram model attains at a given range (Figure \@ref(fig:10-semivariogram-modelling)).

$$
s=c_0+c_1
$$

**Partial sill $c_1$:** The sill minus the nugget.

$$
c_1=s-c_0
$$

**Partial sill to total sill ratio:** The structural variance explained by the fitted variogram model [@rossi_geostatistical_1992]. This is the amount of variance that is spatially autocorrelated [@rossi_geostatistical_1992].

$$ Ratio =\frac {s}{s+a}$$

Usually, we are interested in modeling the semivariance (semivariogram) or variance (variogram) of a process so that we can make predictions from a model rather than an incomplete set of sampled observations. In this way, we can fit a model to our paired observations and use a continuous variogram function to estimate the variance at any lag. These models are known as **theoretical variograms** and in the following sections we will compare several commonly-used models, summarized in Figure \@ref(fig:10-theoretical-variogram-models).

```{r 10-theoretical-variogram-models, fig.cap = fig_cap, out.width= "75%", echo = FALSE}
vv <- rev(seq(0,10,0.01))

## Linear variogram
vv_linear <- rev(vv/max(vv))
#plot(vv_linear,type="l")

## Quadratic variogram
qua <- function(x) {
  xx <- (max(x)*(x^2))/(1+(x^2))
  xx <- xx/max(xx)
  xx
}
vv_quadratic <- rev(qua(vv))
#plot(vv_quadratic),type="l")

## Exponential variogram
vv_exponential <- (max(exp(vv))-exp(vv))/max(exp(vv))
#plot(vv_exponential,type="l")

## Gaussian variogram
gau <- function(x) {
  xx <- 1-exp(-(x/max(x))^2)
  xx <- xx/max(xx)
  xx
}
vv_gaussian <- rev(gau(vv))
#plot(vv_gaussian,type="l")

## Spherical variogram
sph <- function(x) {
  xx <- ((3/2)*(x/max(x)))-((1/2)*((x/max(x))^3))
  xx <- xx/max(xx)
  xx
}
vv_spherical <- rev(sph(vv))
#plot(vv_spherical,type="l")

## Power variogram
pow <- function(x,a) {
  xx <- x^a
  xx <- xx/max(xx)
  xx
}
vv_power_05 <- rev(pow(vv,0.5))
vv_power_15 <- rev(pow(vv,1.5))
#plot(vv_power_15,type="l")

#png("./images/10-theoretical-variogram-models.png",width=1800,height=1800,res=300)
#par(mar=c(4,4,1,1))
#colors <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#plot(vv,type="n",xlim=c(1,1000),ylim=c(0,1),xlab="Lag (h)",ylab="Variance λ(h)",xaxt="n",yaxt="n")
#axis(1,at=c(0,1000),label=c("0","Range (a)"))
#axis(2,at=c(0,1),label=c("0","Sill (s)"))
#vv_nugget <- vv_linear*0+1
#vv_nugget[1] <- 0
#lines(vv_nugget,lwd=2,col=colors[1])
#lines(vv_linear,lwd=2,col=colors[2])
#lines(vv_quadratic,lwd=2,col=colors[3])
#lines(vv_exponential,lwd=2,col=colors[4])
#lines(vv_gaussian,lwd=2,col=colors[5])
#lines(vv_spherical,lwd=2,col=colors[6])
#lines(vv_power_05,lwd=2,col=colors[7])
#lines(vv_power_15,lwd=2,col=colors[8])
#legend(650, 0.4, legend=c("Nugget" ,"Linear", "Quadratic", "Exponential", "Gaussian", "Spherical", "Power (a=0.5)", "Power (a=1.5)"), col=colors, lty=1, lwd=2, cex=1, title="Variogram Models")
#dev.off()

fig_cap <- paste0("Comparison of theoretical variogram models. The upper bound of the variance in the plot is the sill for variogram models that have sills and the upper limit of the lag in the plot is the range for variogram models that have ranges. Note that the linear and power models are not bounded and extend infinitely beyond the plot space. The exponential, quadratic, and Gaussian models approach the sill asymptotically as lag distances approach infinity. Pickell, CC-BY-SA-4.0.")
knitr::include_graphics("images/10-theoretical-variogram-models.png")
```

### Nugget variogram model

A **nugget model** is the simplest model that assumes there is no relationship between variance and lag and consequently there is no range for this model. In other words, the variance is assumed to be constant. This is also the least likely case for most natural spatial phenomena. If you used this model to make spatial predictions, then your predictions would be made as if your data had no spatial component at all, so this is also the least useful model for spatial prediction. The nugget model is expressed as:

$$
\begin{array}{ccc}
\gamma{(h)} = 0 & \text{for }h=0, \\
\gamma{(h)} = 1 & \text{ for }h>0
\end{array}
$$

### Linear variogram model

A **linear model** assumes a linear and constant slope $b$ between variance and lag. Like the nugget model, the linear model has no defined range or sill. Unlike the linear model, variance is assumed to be infinite and therefore it is not possible to distinguish between correlated and uncorrelated lags. Note the special case of $b=0$ is equivalent to the nugget model. A linear effect in spatial autocorrelation suggests a trend in your spatial data. The linear model is expressed as:

$$
\gamma{(h)} = bh+c_0
$$
Where $b$ is the slope to be estimated.

### Quadratic variogram model

A **quadratic or logistic model** assumes variance increases near-exponentially, but the shape is more "S"-shaped or sigmoid. The quadratic model is expressed as:

$$
\gamma{(h)} = \frac{ah^2}{1+bh^2}+c_0
$$
Where $b$ is the slope to be estimated.

### Exponential variogram model

An **exponential model** assumes spatial autocorrelation decreases exponentially with increasing lag distance. The variance in the exponential variogram approaches the sill asymptotically as $h → ∞$, so unlike the spherical and nugget models, an exponential model never assumes a repeating variance beyond the range. Since the variance will continue to marginally increase with infinite lag distance, the range is defined as 95% of the sill $a=0.95s$ or $3a$ beyond which lag distances $h$ are considered not spatially autocorrelated for the given variable. The exponential model is expressed as:

$$
\gamma{(h)} = c_1(1-e^{-\frac{h}{a}})+c_0
$$

### Gaussian variogram model

A **Gaussian model** assumes that spatial autocorrelation is extremely high at short lag distances (i.e., variance is low) and then falls quickly towards zero (i.e., variance is high) at farther lag distances. This is a good model to use if you expect high local autocorrelation or phenomena that change rapidly with distance. The range is defined as 95% of the sill $a=0.95s$ or $\sqrt{3}a$ beyond which lag distances $h$ are considered not spatially autocorrelated for the given variable. The Gaussian model is expressed as:

$$
\gamma{(h)} = c_1(1-e^{-(\frac{h}{a})^2})+c_0
$$

### Spherical variogram model

A **spherical model** assumes a progressive, but not constant, decrease of spatial autocorrelation (i.e., increasing variance) until the range, beyond which autocorrelation is zero and variance remains the same. Note that the slope of a spherical model is generally higher (i.e., faster) than a linear model where $h<a$. The spherical model is one of the most commonly used models because the assumption of zero autocorrelation beyond the range $h>a$. The spherical model is expressed as:

$$
\begin{array}{ccc}
\gamma{(h)} = c_1(\frac{3}{2}\frac{h}{a}-\frac{1}{2}(\frac{h}{a})^3)+c_0 & \text{for }0<h≤a, \\
\gamma{(h)} = s = c_0 + c_1 = 0 & \text{ for }h>a
\end{array}
$$

### Power variogram model

A **power model** modulates the variance by raising the lag to a power $λ$:

$$
\begin{array}{ccc}
\gamma{(h)} = h^λ & \text{for }0<λ<2, \\
\gamma{(h)} = bh+c_0 & \text{for }λ=1
\end{array}
$$

The power model has no range or sill that can be estimated and the variance process is considered to be infinite, the same as the exponential and Gaussian models. Notably, $λ=1$ is equivalent to the linear model.

### Case Study: Selecting the appropriate variogram model {-}

We will look at some different models for estimating the semivariogram. We will use the ground plot data from the young stand monitoring (YSM) program data [@province_of_bc_provincial_2018] for Fort Saint Johns Timber Supply Area (TSA) in the province of British Columbia, Canada. Fort Saint Johns is divided into six blocks respectively Figure \@ref(fig:10-FSJ-plot). There are a total of 108 YSM plots used in this study (Figure \@ref(fig:10-FSJ-plot)). The total basal area ($m^2/ha$) is the variable that we want to predict spatially. For each of the YSM plots, we will calculate the total basal area by adding the basal area for all trees within each plot. We will explore different variogram models with these data and check which model provides the best fit for the variogram.

```{r 10-FSJ-plot, echo=FALSE, fig.cap=fig_cap, message=FALSE, warning=FALSE}
    fig_cap <- paste0("Location of Fort Saint Johns Timber Supply Area (TSA) in British Columbia (BC), Canada and the young stand monitoring (YSM) plots. Nepal, CC-BY-SA-4.0.")
knitr::include_graphics("images/10-FSJ-plot.png")
```

```{r 10-gaussian, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Semivariogram comparing different variogram models for basal area (m2/ha) of the young stand monitoring plots. Nepal, CC-BY-SA-4.0.")

data<-read.csv("data/10/FSJ.csv",header=T)

## summarize the basal area at plot level
data1<- data %>% 
  group_by(utm_easting,utm_northing) %>%
  summarise(Basal=sum(baha_L))
coordinates(data1)= ~ utm_easting+utm_northing

## Model formula
TheVariogram <- variogram(Basal~1, data=data1)

## Initiating the parameters for the variogram, starting search window
nugmodel <- vgm(model="Nug")
linmodel <- vgm(model="Lin")
expmodel <- vgm(model="Exp")
gaumodel <- vgm(model="Gau")
sphmodel <- vgm(model="Sph")
pow05model <- vgm(model="Pow", range=0.5)
pow15model <- vgm(model="Pow", range=1.5)

## Fitting the variogram model
nugfit <- fit.variogram(TheVariogram, model=nugmodel)
linfit <- fit.variogram(TheVariogram, model=linmodel)
expfit <- fit.variogram(TheVariogram, model=expmodel)
gaufit <- fit.variogram(TheVariogram, model=gaumodel)
sphfit <- fit.variogram(TheVariogram, model=sphmodel)
pow05fit <- fit.variogram(TheVariogram, model=pow05model)
pow15fit <- fit.variogram(TheVariogram, model=pow15model)

## Predicting the variogram
nugpreds <- variogramLine(nugfit, maxdist=max(TheVariogram$dist))
linpreds <- variogramLine(linfit, maxdist=max(TheVariogram$dist))
exppreds <- variogramLine(expfit, maxdist=max(TheVariogram$dist))
gaupreds <- variogramLine(gaufit, maxdist=max(TheVariogram$dist))
sphpreds <- variogramLine(sphfit, maxdist=max(TheVariogram$dist))
pow05preds <- variogramLine(pow05fit, maxdist=max(TheVariogram$dist))
pow15preds <- variogramLine(pow15fit, maxdist=max(TheVariogram$dist))

## Plot
colors <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
plot(TheVariogram$dist/1000,TheVariogram$gamma,xlim=c(0,100),ylim=c(-1000,3500),xlab="Lag h (km)",ylab="Semivariance γ(h)",type="n")
lines(nugpreds$dist/1000,nugpreds$gamma,lwd=2,col=colors[1])
lines(linpreds$dist/1000,linpreds$gamma,lwd=2,col=colors[2])
#lines(vv_quadratic,lwd=2,col=colors[3])
lines(exppreds$dist/1000,exppreds$gamma,lwd=2,col=colors[4])
lines(gaupreds$dist/1000,gaupreds$gamma,lwd=2,col=colors[5])
lines(sphpreds$dist/1000,sphpreds$gamma,lwd=2,col=colors[6])
lines(pow05preds$dist/1000,pow05preds$gamma,lwd=2,col=colors[7])
lines(pow15preds$dist/1000,pow15preds$gamma,lwd=2,col=colors[8])
points(TheVariogram$dist/1000,TheVariogram$gamma,pch=20)
```

Just looking at the variograms, it appears that all of the four models fit our data well and indicates there is a strong correlation in basal area per hectare of live trees between the plots. However, we will use all the components of semivariogram models to pick our best fitting variogram.

```{r 10-circular-2, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
Model<-c("Circular", "Gaussian", "Spherical", "Exponential")
Range<-c("7433.68", "4446.62","9729.37", "3871.76")
Nugget<-c("552.71","0.00","14.13","0.00")
Sill<-c("2338.27","2995.05","2980.05","2994.05")
Partial_sill<-c("1785.56","4446.62","2966.37","2994.05")
Sill_to_Sill<-c("0.76","1.00","0.99","1.00")
summary<-data.frame(Model,Range,Nugget,Sill,Partial_sill,Sill_to_Sill)
View(summary)
knitr::kable(summary, caption="Summary of various component of variogram for four different models.")
```

We can see that the p-to-s ratio is highest for Gaussian and exponential semivariogram. This indicates the most total amount of semivariance that is spatially autocorrelated. Similarly, we can see that both models are indicating that we are observing a spatial autocorrelation in basal area at a very shorter range. Since with, exponential semivariogram autocorrelation only disappear at a infinite distance in reality, it is better to pick a Gaussian model in most of the similar cases like we have in this example.

## Sampling

**Sampling** can be defined as the process of selecting some part of a population in order to make an inference, and estimate some parameters about the whole population[@thompson_sampling_2012]. For example, to estimate the amount of biomass of trees in the University of British Columbia Malcolm Knapp Research Forest, scientists collect data on tree size and height from 100 randomly distributed small plots across the forest. Based on some allometric equations that relate these tree size measures to the mass of different species, biomass of the entire Malcolm Knapp Research Forest can be estimated. Similarly, to estimate the amount of recoverable oil in a region, a few (highly expensive) sample holes can be drilled (example adapted from @thompson_sampling_2012). The situation is similar in a national opinion survey, in which only a sample of the people in the population are contacted, and the opinions in the sample are used to estimate the proportions with the various opinions in the whole population (example adapted from @thompson_sampling_2012). 

Sampling should not be confused with observational study. In an observational study, one has little-to-no control over the inclusion of units in the study whereas sampling usually consists of a well-defined protocol for defining both the population under study and the inclusion criteria of units to sample [@thompson_sampling_2012]. Broadly, sampling can be categorized into two groups [@teddlie_mixed_2007]:

1. Probability sampling
2. Non-probability sampling

Before getting into the details about different types of sampling. We will make ourself familiar with some sampling key terms and their definitions.

:::: {.box-content .call-out-content}

::: {.box-title .call-out-top}
## Recall This {-}
:::

<p id="box-text">

## Population

Any large spatially defined entity of plots, people, trees, animals etc., from which samples are drawn and measurement of certain characteristics is conducted. 

## Sampling Design

The procedure by which the sample of units is selected from the population is called the sampling design. 

## Sampling Unit

The smallest entity within a population from which the information about population is drawn is known as sampling unit. For example, in a survey of potential internet user over entire BC , sampling unit can be the certain number of household in each city across BC. 

</p>

::::

## Probability Sampling

Probability sampling techniques are mostly used in studies that use extensive amount of quantitative analysis [@tashakkori_sage_2010]. It involves selecting a large number of units from a population where the probability of inclusion for every member of the population is determinable [@tashakkori_sage_2010].

## Simple Random Sampling

In simple random sampling, each sampling unit within a given population has equal probability of being selected in a sample [@thompson_sampling_2012].For example, suppose we would like to measure the tree heights of all the trees from a simple random sample of 60 plots with their spatial locations (given by plots center co-ordinates) from a forest divided into 625 spatially defined plots as given in **Figure \@ref(fig:10-simple-random-sampling)**.  Notice, there is no distinctive pattern on how plots are being selected for the measurement of tree heights, this justify the **random part** of the simple random sampling. 

As an investigator, when we make a sequence of selections from a population, at each step, new and distinct set of sampling units are being selected in the sample, each having equal probability of being selected at each step.

For example, when we take another sample of 60 plots, we can see that different **sampling units (plots)** are being selected from what we obtained, this represent the **equal probability** of each sampling unit being selected.

```{r 10-simple-random-sampling, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Simple random sample of 60 units from a population of 625 units. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:625), nrow=25, ncol=25)
df <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df$val <- m[as.matrix(df[c('y','x')])]
n <- df[sample(nrow(df), 60, replace = FALSE, prob = NULL),]
p<-ggplot(n,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df, fill='transparent', colour = 'black') + 
  geom_tile(data=n,fill='black')+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```

Simple random sample of 60 units from a population of 625 units.

```{r 10-simple-random-sampling-2, echo=FALSE,warning=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Another simple random sample of 60 units. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:625), nrow=25, ncol=25)
df2 <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df2$val <- m[as.matrix(df[c('y','x')])]
n <- df2[sample(nrow(df2), 60, replace = FALSE, prob = NULL),]

p<-ggplot(n,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df2, fill='transparent', colour = 'black') + 
  geom_tile(data=n,fill='black')+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```

## Stratified Random Sampling

When a population under study is not **homogeneous** (similar in biological characteristics) across the entire study area and consists of some sort of gradient, stratified random sampling method is used [@thompson_sampling_2012]. The principle of stratification is to partition the population in such a way that the units within a stratum are as similar as possible [@teddlie_mixed_2007]. Random samples from each strata are drawn to ensure adequate sampling of all groups [@teddlie_mixed_2007]. Even though one stratum may differ markedly from another, a stratified sample with the desired number of units from each stratum in the population will tend to be “representative” of the population as a whole [@howell_area_2020].

For example, a forest under study is divided **(stratified)** into similar regions Figure \@ref(fig:10-simple-random-sampling-2) defined by elevation, soil moisture, and soil nutrient gradient and random samples are taken within each strata. The stratification of a study region despite of its size can help to spread the sample over the entire study area.


```{r 10-stratified-random-sampling, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Stratified random sample within unequal strata within a study area. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:400), nrow=20, ncol=20)
df3 <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df3$val <- m[as.matrix(df3[c('y','x')])]
n <- df3[sample(nrow(df3), 40, replace = FALSE, prob = NULL),]
p<-ggplot(n,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df3, fill='transparent', colour = 'black') + 
  geom_tile(data=n,fill='black')+
  geom_rect(aes(xmin =0.5,xmax = 10.5,ymin = 0.5, ymax = 20.5),
    fill = 'pink',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =10.5,xmax = 20.5,ymin = 0.5, ymax = 5.5),
    fill = 'skyblue',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =10.5,xmax = 20.5,ymin = 5.5, ymax = 20.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```

Stratified random sample within unequal strata within a study area.

```{r 10-stratified-random-sampling-2, echo=FALSE,warning=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Stratified random sample from equal strata within a study area. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:400), nrow=20, ncol=20)
df <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df$val <- m[as.matrix(df[c('y','x')])]
n <- df[sample(nrow(df), 40, replace = FALSE, prob = NULL),]
p<-ggplot(n,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df, fill='transparent', colour = 'black') + 
  geom_tile(data=n,fill='black')+
  geom_rect(aes(xmin =0.5,xmax = 10.5,ymin = 0.5, ymax = 10.5),
    fill = 'pink',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =0.5,xmax = 10.5,ymin = 10.5, ymax = 20.5),
    fill = 'skyblue',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =10.5,xmax = 20.5,ymin = 10.5, ymax = 20.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin =10.5,xmax = 20.5,ymin = 0.5, ymax = 10.5),
    fill = 'brown',alpha=0.02,color = "black",size = 2)+
  
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```

Stratified random sample from equal strata within a study area.

## Systematic Sampling

A systematic sample uses a fixed grid or array to assign plots in a regular pattern **Figure \@ref(fig:10-stratified-random-sampling-2)** [@mcroberts_sampling_2014]. The advantage of systematic sampling is that it maximizes the average distance between the plots and therefore minimizes spatial correlation among observations and increases statistical efficiency [@mcroberts_sampling_2014]. In addition, a systematic sample, which is clearly seen to be representative in some sense, can be very convincing to decision-makers who lack experience with sampling [@mcroberts_sampling_2014]. Raster grids such as digital elevation models (DEM) are some examples of systematic sample.

```{r 10-systematic-sampling, echo=FALSE,warning=FALSE,message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Sample every second observation in the row. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:400), nrow=20, ncol=20)

# Create row indicator
df <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df$val <- m[as.matrix(df[c('y','x')])]
row_odd <- seq_len(nrow(df)) %% 2 
col_odd<- seq_len(ncol(df)) %% 2
data_row_odd <- df[row_odd == 0,] 
data_col_odd <- df[col_odd == 0,]
# Subset odd rows
p<-ggplot(data_col_odd,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df, fill='transparent', colour = 'black') + 
  geom_rect(data=data_col_odd,aes(xmin = x, ymin = y, xmax = x + 0.3, ymax = y + 0.6),fill='black')+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```

Sample every second observation in the row 

```{r 10-systematic-sampling-2, echo=FALSE,warning=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Sample all the observation in every second column. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:400), nrow=20, ncol=20)

# Create row indicator
df <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df$val <- m[as.matrix(df[c('y','x')])]
row_odd <- seq_len(nrow(df)) %% 2 
data_row_odd <- df[row_odd == 0,] 
# Subset odd rows
p<-ggplot(data_row_odd,aes(x=x, y=y, label=val)) + 
  geom_tile(data=df, fill='transparent', colour = 'black') + 
  geom_rect(data=data_row_odd,aes(xmin = x, ymin = y, xmax = x + 0.3, ymax = y + 0.6),fill='black')+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p
```
Sample all the observation in every second column 

## Cluster Sampling 

In cluster sampling, rather than sampling individual units, which might be geographically spread over great distances, we can sample groups (clusters) of plots that occur naturally in the study area [@teddlie_mixed_2007]. Cluster sampling is employed when we want to be more efficient in terms of the use of time and money to generate a more efficient probability sample [@teddlie_mixed_2007].

```{r 10-cluster-sampling, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Cluster of plots selected from entire study area. Nepal, CC-BY-SA-4.0.")
m <- matrix(c(1:400), nrow=20, ncol=20)
df <- expand.grid(x=1:ncol(m),y=1:nrow(m))
df$val <- m[as.matrix(df[c('y','x')])]
p<-ggplot(df,aes(x=x, y=y,label=val)) + 
  geom_tile(data=df, fill='transparent', colour = 'black') +
  geom_rect(aes(xmin = 0.5,xmax = 3.5,ymin = 14.5, ymax = 20.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin = 5.5,xmax = 7.5,ymin = 5.5, ymax = 7.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin = 12.5,xmax = 14.5,ymin = 12.5, ymax = 14.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  geom_rect(aes(xmin = 15.5,xmax = 20.5,ymin = 0.5, ymax = 5.5),
    fill = 'lightgrey',alpha=0.02,color = "black",size = 2)+
  scale_y_reverse() +
  theme_classic() + 
  theme(axis.text  = element_blank(),
        panel.grid = element_blank(),
        axis.line  = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
p 
```

Cluster of plots selected from entire study area

## Non-probability Sampling

Non-probability sampling is generally used in qualitative studies. They are also know as **purposive or adaptive sampling**, and defined as selecting units (e.g.,individuals, groups of individuals, institutions) based on specific purposes associated with answering some research questions. **Purposive or Adaptive** sampling can be classified into three broad categories [@teddlie_mixed_2007]:

## Representative Sampling

This type of sampling is used when we want to select samples that will represent broader groups as closely as possible [@teddlie_mixed_2007]. One of the example of representative sampling is selecting 100 Douglas fir and 50 Spruce tree from study area within Malcom Knapp Forest, BC consisting of 500 Douglas fir and 300 Spruce trees for the measurement of tree height.

## Unique Case Sampling

In this sampling, we want to focus on more specific case which is unique and rare in terms of one or more characteristics [@teddlie_mixed_2007]. One of the example of unique case sampling could be understanding the genetic makeup of person who is not affected by Covid-19 virus. 

## Sequential Sampling

In this sampling method, we would pick up a single or group of cases in an interval of time, analyzes the results and then move on to the next group of cases and so on [@teddlie_mixed_2007]. The goal of the research project is to generate some theory (or broadly defined themes) [@teddlie_mixed_2007]. 

## Spatial Interpolation  

**Spatial interpolation** can be defined as the process of predicting the given **variable of interest** at an unmeasured location given we have the sample in the proximity of the unknown location. Spatial interpolation methods can be categorized into two broad groups:

1. Methods without using semivariogram
2. Methods using semivariogram

We will discuss both method with a case study in detail.

## Case Study 3 

For this case study, we will use ground plot data from Young stand monitoring (YSM) program data [@province_of_bc_provincial_2018] for Fort Saint Johns timber supply area (TSA) in the province of British Columbia, Canada. Fort Saint Johns is divided into 6 blocks respectively **Figure \@ref(fig:10-FSJ-plot)**. There are a total of 108 YSM plots used in this study **Figure \@ref(fig:10-FSJ-plot)**. The total basal area (m2/ha) is our variable of interest in this study. For each of the YSM plot the total basal area was calculated by adding the basal area for all trees within the plot. We will use this dataset to explore different interpolation technique to find the variable of interest (basal area) within the unsampled locations.

```{r 10-FSJ-plot1, echo=FALSE, fig.cap=fig_cap, message=FALSE, warning=FALSE}
    fig_cap <- paste0(" Location of Fort Saint Johns TSA and the young stand change monitoring plots. Nepal, CC-BY-SA-4.0.")
    knitr::include_graphics("images/10-FSJ-plot.png")
```

## Methods Without Using Semi-variogram 

```{r 10-semivariogram-load-data-and-setup}
filename <-read_sf(dsn="data/10",layer="Fort_St_Jh")
plot1<- read.csv("data/10/FJS_plots.csv",header=T)
```

## Nearest Neighbor

Nearest neighbor interpolation approach uses the value of variable of interest from the nearest sampled location and assign the value to the unsampled location of interest [@titus_comparison_2013]. It is very simple method and is most widely used for image processing in remote sensing research [@titus_comparison_2013].

**Step 1:Match the projection of plot data with the study area boundary**

```{r 10-nearest-neighbor, warning=FALSE, message=FALSE}
# spatstat Used for the dirichlet tessellation function
# maptools Used for conversion from SPDF to ppp
# raster Used to clip out thiessen polygons
spdf <- as_Spatial(filename)
dsp <- SpatialPoints(plot1[,14:15], proj4string=CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83"))
dsp <- SpatialPointsDataFrame(dsp, plot1)
###############
TA <- CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83")
library(rgdal)
dta <- spTransform(dsp, TA)
cata <- spTransform(spdf, TA)
```

**Step 2: Create polygons throughout the study area where interpolation is to be done and rasterize the polygons**

```{r 10-nearest-neighbor-2, warning=FALSE, message=FALSE, fig.cap=fig_cap}
#fig_cap <- paste0("An intermediate step in creating polygon and rasterizing it over the entire Fort Saint Johns TSA. Nepal, CC-BY-SA-4.0.")
#v <- voronoi(dta)
#plot(v)
### FUNCTION BELOW CRASHES R, COMMENTING OUT FOR NOW
#vca <- intersect(v, cata)
#spplot(vca, 'baha_L', col.regions=rev(get_col_regions()))
################### rasterize the data ####################
#r <- raster(cata, res=100)
#vr <- rasterize(vca, r, 'baha_L')
```

An intermediate step in creating polygon and rasterizing it over the entire Fort Saint Johns TSA.

**Step 3: Nearest neighbor with five unsampled points to be interpolated at a time and plot the results**

```{r 10-nearest-neighbor-3, warning=FALSE, message=FALSE, fig.cap=fig_cap}
### COMMENTING BELOW BECAUSE ABOVE CHUNK FAILS
#fig_cap <- paste0("Predicted basal area over the entire Fort Saint Johns TSA using five nearest neighbor. Nepal, CC-BY-SA-4.0.")
## gstat package to create semivariogram model, kriging an dinterpolation
#gs <- gstat(formula=baha_L~1, locations=dta, nmax=5, set=list(idp = 0))
#nn <- interpolate(r, gs)
#nnmsk <- mask(nn,vr)
#tm_shape(nnmsk) +
#  tm_raster(n=8,palette = "RdBu", auto.palette.mapping = FALSE,title="Predicted basal area") +
#  tm_legend(legend.outside=FALSE)
```

Predicted basal area over the entire Fort Saint Johns TSA using five nearest neighbor.

**Step 4: Leaflet map for some interactions**

```{r 10-nearest-neighbor-4, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
### COMMENTING BELOW BECAUSE ABOVE CHUNK FAILS
#fig_cap <- paste0("Predicted basal area over the entire Fort Saint Johns TSA using five nearest neighbor projected over the province of British Columbia. Nepal, CC-BY-SA-4.0.")
#pal <- colorNumeric(c("#fef0d9","#fdcc8a","#fc8d59","#e34a33","#b30000"), values(nnmsk),
#  na.color = "transparent")
#leaflet() %>% addTiles() %>%
#  addRasterImage(nnmsk, colors = pal, opacity = 0.5) %>%
#  addLegend(pal = pal, values = values(nnmsk),
#    title = "Predicted basal area")
```

Predicted basal area over the entire Fort Saint Johns TSA using five nearest neighbor projected over the province of British Columbia. 

## Thiessian Polygon

In this method, the domain is determined into the area/polygons of regions containing one sampling point from the original data [@coulston_effect_nodate].The thiessen polygons are assigned with the same values of the variable of interest as the point sampled [@yamada_thiessen_2016].

**Step 1: Match the projection of the Shape file and the plot data**

```{r 10-nearest-neighbor-5, warning=FALSE, message=FALSE}
# project the plot data based to UTM zone 10 and NAD83
dsp <- SpatialPoints(plot1[,14:15], proj4string=CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83"))
# convert the data into spatial object
dsp <- SpatialPointsDataFrame(dsp, plot1)
## change the projection of both shape file and plot data
TA <- CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83")
library(rgdal)
dta <- spTransform(dsp, TA)
cata <- spTransform(spdf, TA)
```

Step 2: Create the thiessian polygon around the sample points for entire TSA using

```{r 10-nearest-neighbor-6, warning=FALSE, message=FALSE}
#"dirichlet" function from "spatstat" package
# Create a tessellated surface
th  <-  as(dirichlet(as.ppp(dta)), "SpatialPolygons")

# The dirichlet function does not carry over projection information
# requiring that this information be added manually to the thiessian polygons
proj4string(th) <- proj4string(dta)
```

Step 3: The tessellated surface does not store attribute information from the point data layer. Hence, the information from the data layer should be carried over to tessellated surface 

```{r 10-nearest-neighbor-7, warning=FALSE, message=FALSE}
#We'll use the over() function from the "sp" package to join the point attributes to the thiessian polygon via spatial join
th.z     <- over(th,dta, fn=mean)
th.spdf  <-  SpatialPolygonsDataFrame(th, th.z)

# Finally, we'll clip the tessellated  surface to the Texas boundaries
th.clp   <- raster::intersect(cata,th.spdf)
```

Step 4: Visualize the results

```{r 10-nearest-neighbor-8, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area over the entire Fort Saint Johns TSA using thiessan polygons. Nepal, CC-BY-SA-4.0.")
# Map the data
#using package "tmap"
tm_shape(th.clp) +
  tm_polygons(col="baha_L", palette="RdBu", auto.palette.mapping = FALSE,title="Predicted basal area") +
  tm_legend(legend.outside=FALSE)
```

**Inverse Distance Weighting**

***Inverse distance weighting (IWD)** [@shepard_two-_1968] estimates the variable of interest by assigning more weight to closer points using the weighting function (w) based on the weighting exponent know as power (p) [@babak_statistical_2009]. The influence of one data point on the other decreases as the distance increases. Hence, higher power of the exponent will result in point of interest having less effect on the points far from it [@babak_statistical_2009]. It is a simple technique that does not require prior information to be applied to spatial prediction [@shepard_two-_1968]. Lower value of exponents mean more averaging, and the weights are more evenly distributed among the surrounding data points [@shepard_two-_1968].

**Step 1: Fix the projections between data points and shape file**

```{r 10-idw, warning=FALSE, message=FALSE}
# project the data based on the colorado plateau boundry projection

dsp <- SpatialPoints(plot1[,14:15], proj4string=CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83"))
# convert the data into spatial object
dsp <- SpatialPointsDataFrame(dsp, plot1)

TA <- CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83")
library(rgdal)
dta <- spTransform(dsp, TA)
cata <- spTransform(spdf, TA)
```

**Step 2: Create empty grid or over-lay empty grid over the study area**

An empty grid over Fort Saint Johns TSA is created, where n is the total number of cells over which interpolation is to be done.

```{r 10-idw-2, warning=FALSE, message=FALSE}
grd              <- as.data.frame(spsample(dta, "regular", n=5000))
names(grd)       <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd)     <- TRUE  # Create SpatialPixel object
fullgrid(grd)    <- TRUE  # Create SpatialGrid object

# Add P's projection information to the empty grid
proj4string(dta) <- proj4string(dta) # Temp fix until new proj env is adopted
proj4string(grd) <- proj4string(dta)
```

**Step 3: Interpolate the grid cells using a power value of 2**

Power values can be adjusted depending on characteristics of variable being interpolated

```{r 10-idw-3, warning=FALSE, message=FALSE}
P.idw <- gstat::idw(baha_L ~ 1, dta, newdata=grd, idp=2.0)

# Convert to raster object then clip to Texas
r       <- raster(P.idw)
r.m     <- mask(r, cata)
```

**Step 4: Plot the results from inverse distance weighting interpolation using**

```{r 10-idw-4, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area over the entire Fort Saint Johns TSA using inverse distance weighting. Nepal, CC-BY-SA-4.0.")
tm_shape(r.m) + 
  tm_raster(n=8,palette = "RdBu", auto.palette.mapping = FALSE,
            title="Predicted basal area") + 
  tm_shape(dta) + tm_dots(size=0.2) +
  tm_legend(legend.outside=FALSE)
```
Predicted basal area over the entire Fort Saint Johns TSA using inverse distance weighting.

**Step 5: Leaflet map for some interaction**

```{r 10-idw-5, echo=FALSE, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area over the entire Fort Saint Johns TSA using inverse distance weighting projected over the province of Brithish Columbia. Nepal, CC-BY-SA-4.0.")
pal <- colorNumeric(c("#fef0d9","#fdcc8a","#fc8d59","#e34a33","#b30000"), values(r.m),
  na.color = "transparent")
leaflet() %>% addTiles() %>%
  addRasterImage(r.m, colors = pal, opacity = 0.5) %>%
  addLegend(pal = pal, values = values(r.m),
    title = "Predicted basal area")
```

**Figure 10.23.** \ref{fig:figs} Predicted basal area over the entire Fort Saint Johns TSA using inverse distance weighting projected over the province of British Columbia.

## Methods Using Semi-variogram 

## Kriging 

The spatial interpolation technique such as inverse distance weighting(IWD), nearest neighbor, and polygon approach are based on the surrounding neighborhood. There is another group of interpolation methods generally know as **kriging** [@krige_statistical_1951] which is based on both surrounding neighborhood and statistical models, especially **spatial autocorrelation**. Kriging uses the variogram modeling approach we studied in section **10.3** as a statistical model and incorporates the information about **spatial autocorrelation** while performing the interpolation. Since kriging uses the geostatistical model it has capacity of both prediction and provides some measure of the accuracy of prediction [@goovaerts_kriging_2008].The basic assumption of kriging is that the distance based samples reflect some degree of spatial correlation [@goovaerts_kriging_2008]. We should note one thing that kriging works with raster surfaces where variable of interest are to be interpolated using the sampled locations. Kriging works with the following basic mathematical model: 
$$\hat{Z_{s_0}}=\sum_{i}^N \lambda_{i}Z_{s_i}$$

Where, $$\hat{Z_{s_o}}=\text{variable of interest to predicted at unsampled loaction}\ s_0 $$
$$ \lambda=\text{an unknown value of weight at the measured } s_{i} location$$
$$Z_{s_i}=\text{measured value at the sampled location} \ s_i$$

The goal of kriging is to determine the weights $$\lambda_i$$ that will minimize the variance estimator of the predicted value and actual value at the unsampled location: 
$$ Var|\hat{Z}_{s_o}-Z_{s_o}|$$
The $$\hat{Z}_{s_0}$$ is decomposed into a trend component $$\mu_{s_o}$$, which is the mean function as seen in the following equation:
$$\hat{Z}_{s_o} = \mu_{s_o} + \epsilon_{s_o}$$
Where,  
$$\epsilon_{s_o} = \text{spatially autocorrelated erros}$$

### Linear Kriging

Linear kriging are distribution free linear interpolation techniques that are in alignment with linear regression methods [@asa_comparison_2012]. There are three principle linear kriging techniques as discussed below:

## Case Study 4 

For this case study, we will use ground plot data from Young stand monitoring (YSM) program data [@province_of_bc_provincial_2018] for Fort Saint Johns timber supply area (TSA) in the province of British Columbia, Canada. Fort Saint Johns is divided into 6 blocks respectively Figure \@ref(fig:10-FSJ-plot). There are a total of 108 YSM plots used in this study Figure \@ref(fig:10-FSJ-plot). The total basal area (m2/ha) is our variable of interest in this study. For each of the YSM plot the total basal area was calculated by adding the basal area for all trees within the plot. We will use this dataset to explore different interpolation technique to find the variable of interest (basal area) in the unsampled locations.

## Simple Kriging

Simple kriging works with the assumption that the mean is known and constant over entire domain and calculated as the average of the data [@wackernagel_multivariate_2002]. The number of sampled points used to make the prediction of the variable of interest in unmeasured location depends upon the range of semivariogram model used [@burrough_principles_1998].

$$\hat{Z}_{s_o}=\mu_{s_o} + \epsilon_{s_o}$$
Where, $$\hat{Z}_{s_o}=\text{variable of interest predicted at a given saptial location}\ s_{o}$$
$$\mu_{s_o}=\text{an known constant mean} $$

Step 1: Make sure the projection of the point data and shape file is same

```{r 10-simple-kriging, warning=FALSE, message=FALSE}
# summarize the data to individual plots
data<-plot1 %>% 
  group_by(utm_eastin,utm_northi) %>%
  summarise(total= sum(baha_L))
# convert the data to saptial point data frame and change the projection to NAD83
dsp <- SpatialPoints(data[,1:2], proj4string=CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83"))
dsp <- SpatialPointsDataFrame(dsp, data)

# Make the projection similar for plot and polygon data
TA <- CRS("+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83")
dta <- spTransform(dsp, TA)
cata <- spTransform(spdf, TA)
```

Step 2: Create an empty grid

A grid with the total number of n cells where basal area is to be predicted is overlaid over Forty Saint Johns. The grid will be a raster

```{r 10-simple-kriging-2, warning=FALSE, message=FALSE}
grd              <- as.data.frame(spsample(dta, "regular", n=10000))
names(grd)       <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd)     <- TRUE  # Create SpatialPixel object
fullgrid(grd)    <- TRUE  # Create SpatialGrid object

# Add projection information to the empty grid relative to the plot and polygon projection 
proj4string(dta) <- proj4string(dta) # Temp fix until new proj env is adopted
proj4string(grd) <- proj4string(dta)
```

Step 3: Calculate the overall mean of the variable to be interpolated

Simple kriging, which assumes that mean is a known constant over entire domain need a mean value of variable of interest (basal area)

```{r 10-simple-kriging-3, warning=FALSE, message=FALSE}
basal<-mean(data$total)
basal
```

Step 4: Semivariogram modeling

Start by fitting the semivariogram model for the variable of interest (basal area) and see which model best fit the data

```{r 10-simple-kriging-4, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Variogram models fitted for basal area using the YSM plot data. Nepal, CC-BY-SA-4.0.")
#### Exponential variogram ###############
TheVariogram=variogram(total~1, data=dta)
TheVariogramModel <- vgm(psill=2500, model="Exp", nugget=1500, range=20000)
FittedModel <- fit.variogram(TheVariogram, model=TheVariogramModel) 
preds = variogramLine(FittedModel, maxdist = max(TheVariogram$dist))
g<-ggplot(TheVariogram,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds)+ theme_classic()+ 
  labs(x = "lag distance (h)", y = "Semi-variance")+
   theme_bw()+ ggtitle("Exponential")+
  theme(text = element_text(size =14))+
  theme(axis.title.x=element_blank(),
        axis.title.y=element_text(size=14),
        axis.text.x =element_blank(),
        axis.text.y =element_text(size=14))

#### Spherical Variogram
TheVariogramModel1 <- vgm(psill=2500, model="Sph", nugget=1500, range=20000)
FittedModel1 <- fit.variogram(TheVariogram, model=TheVariogramModel1) 
preds1 = variogramLine(FittedModel1, maxdist = max(TheVariogram$dist))
h<-ggplot(TheVariogram,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds1)+ theme_classic()+ 
  labs(x = "lag distance (h)", y = "Semi-variance")+
   theme_bw()+ggtitle("Spherical")+
  theme(text = element_text(size =14))+
  theme(axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.x =element_blank(),
        axis.text.y =element_blank(),
        axis.ticks.y= element_blank())

## Gaussian Variogram
TheVariogramModel2 <- vgm(psill=2500, model="Gau", nugget=1500, range=20000)
FittedModel2 <- fit.variogram(TheVariogram, model=TheVariogramModel2) 
preds2 = variogramLine(FittedModel2, maxdist = max(TheVariogram$dist))
i<-ggplot(TheVariogram,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds2)+ theme_classic()+ 
  labs(x = "lag distance (h)", y = "Semi-variance")+
   theme_bw()+ggtitle("Gaussian")+
  theme(text = element_text(size =14))+
  theme(axis.title.x=element_text(size=14),
        axis.title.y=element_text(size=14),
        axis.text.x =element_text(size =14),
        axis.text.y =element_text(size=14))

## circular Variogram
TheVariogramModel3 <- vgm(psill=2500, model="Cir", nugget=1500, range=20000)
FittedModel3 <- fit.variogram(TheVariogram, model=TheVariogramModel3) 
preds3 = variogramLine(FittedModel3, maxdist = max(TheVariogram$dist))
j<-ggplot(TheVariogram,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds3)+ theme_classic()+ 
  labs(x = "lag distance (h)", y = "Semi-variance")+
  theme_bw()+ggtitle("Circular")+
  theme(text = element_text(size =14))+
  theme(axis.title.x=element_text(size=14),
        axis.title.y=element_blank(),
        axis.ticks = element_blank(),
        axis.text.x =element_text(size =14),
        axis.text.y =element_blank())

########### combine all the plots together #########
####################################################################################
grids_bs <- plot_grid(g,h,i,j,ncol=2,align = "h")
grids_bs
```

Variogram models fitted for basal area using the YSM plot data. 

Step 5: Put all the variogram parameters in a table and see which fits best

Table 10.2 Summary of various components of variogram from four variogram models.

```{r 10-simple-kriging-5, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Variogram models fitted for basal area using the YSM plot data. Nepal, CC-BY-SA-4.0.")
Model<-c("Exponential", "Spherical", "Gaussian", "Circular")
Range<-c("3872.10", "9704.83","4552.88", "7801.57")
Nugget<-c("0.00","0.00","0.00","729.25")
Sill<-c("2994.10","2994.64","2995.03","2161.74")
Partial_sill<-c("2994.10","2994.64","2995.03","1432.49")
Partial_sill_to_Sill<-c("1","1","1","0.66")
summary<-data.frame(Model,Range,Nugget,Sill,Partial_sill,Partial_sill_to_Sill)
knitr::kable(summary, caption =NULL)
```

Looking into the variogram and the parameter (Figure \@ref(fig:10-simple-kriging-4), Table: (\#tab:10-simple-kriging-5), we can see that exponential variogram fits the data quite well compared to spherical and Gaussian as it has short range, low nugget and high partial sill to total sill ratio. As pointed earlier, the spatial autocorrelation only disappear at infinite lag using exponential model and best way to go is Gaussian for our data.

Step 6: We will use beta = 95.73, as we know the assume mean is a known constant over the domain for simple kriging

```{r 10-simple-kriging-6, warning=FALSE, message=FALSE}
simple<- krige(total ~ 1, dta, grd, model=FittedModel2,beta=95.73)
```

Step 6: Visualize the predicted surface

```{r 10-simple-kriging-7, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area using simple kriging. Nepal, CC-BY-SA-4.0.")
#convert the kriging results to raster
raster_krig      <- raster(simple)
raster_clip    <- mask(raster_krig, cata)

# Plot the kriging result using the library tmap
tm_shape(raster_clip) + 
  tm_raster(n=8,palette = "RdBu", auto.palette.mapping = FALSE,
            title="Predicted basal area using simple kriging") + 
  tm_shape(dta) + tm_dots(size=0.2) +
  tm_legend(legend.outside=FALSE)

```
Predicted basal area using simple kriging.

Step 7: Cross validation

```{r 10-simple-kriging-8, warning=FALSE, message=FALSE}
cv_sK <- krige.cv(total~ 1, dta, model=FittedModel2, nfold=nrow(dta),
verbose=FALSE)
### calculate RMSE
res <- as.data.frame(cv_sK)$residual
sqrt(mean(res^2))
#### Mean residual
mean(res)
#### Mean squared deviation of the prediction VS the sample
mean(res^2/as.data.frame(cv_sK)$var1.var)
```

## Ordinary Kriging

Ordinary krigning [@matheron_intrinsic_1973] is one of the most widely used method of kriging which assume that the mean for variable of interest is an unknown constant within the domain. The mean is calculated based on the sample that is within the search window, i.e.,local mean instead of assumed constant mean over entire domain [@clark_practical_2007] [@goovaerts_kriging_2008]. It assumes the following model:

$$\hat{Z}_{s_o}=\mu_{s_o} + \epsilon_{s_o}$$
Where, $$\hat{Z}_{s_o}=\text{variable of interest predicted at a given saptial location}\ s_{o}$$

$$\mu_{s_o}=\text{an unknown constant mean}$$

Steps 1, 3, 4, are similar to what we did for simple kriging and we don't need step 2 as mean is assumed to be a unknown constant

Step 5: Using ordinary kriging

Krigging model is specified only using variable of interest.  

```{r 10-ordinary-kriging, warning=FALSE, message=FALSE}
ordinary<- krige (total ~ 1, dta, grd, model=FittedModel2)
```

Step 6: Visualize the predicted surface

```{r 10-ordinary-kriging-2, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area using ordinary kriging. Nepal, CC-BY-SA-4.0.")
#convert the kriging results to raster
raster_Ok      <- raster(ordinary)
Ok_clip    <- mask(raster_Ok, cata)

# Plot the kriging result using the library tmap
tm_shape(Ok_clip) + 
  tm_raster(n=8,palette = "RdBu", auto.palette.mapping = FALSE,
            title="Predicted basal area using ordinary kriging") + 
  tm_shape(dta) + tm_dots(size=0.2) +
  tm_legend(legend.outside=FALSE)

```
Predicted basal area using ordinary kriging.

Step 7: Cross validation**

```{r 10-ordinary-kriging-3, warning=FALSE, message=FALSE}
cv_oK <- krige.cv(total~ 1, dta, model=FittedModel2, nfold=nrow(dta),
verbose=FALSE)
### calculate RMSE
res <- as.data.frame(cv_oK)$residual
sqrt(mean(res^2))
#### Mean residual
mean(res)
#### Mean squared deviation of the prediction VS the sample
mean(res^2/as.data.frame(cv_oK)$var1.var)
```

## Universal Kriging

The universal kriging is one of the variant of ordinary kriging. This method assume that the mean varies from location to location in a deterministic way (trend or drift) while the variance is constant throughout the domain [@matheron_traite_1962]. One example could be measurements of temperatures, which are commonly related to elevation (at a known rate of oC by m difference).

Step 1: This is similar to what we did for simple kriging and we will calculate the mean based on the drift or trend in data (more localized mean) using the spatial location of the plots

```{r 10-universal-kriging, warning=FALSE, message=FALSE}

# Add X and Y to our original point dataframe, it is just how universal kriging formula takes the value of the co-ordinates.
dta$X <- coordinates(dta)[,1]
dta$Y<- coordinates(dta)[,2]

## We will model the trend or drift using the location as X and y 
TheVariogram4=variogram(total~X+Y, data=dta)
TheVariogramModel4 <- vgm(psill=2500, model="Gau", nugget=1500, range=20000)
FittedModel4 <- fit.variogram(TheVariogram4, model=TheVariogramModel4) 
preds4 = variogramLine(FittedModel4, maxdist = max(TheVariogram4$dist))

g<-ggplot(TheVariogram4,aes(x=dist,y=gamma))+geom_point()+
  geom_line(data = preds4)+ theme_classic()+ 
  labs(x = "lag distance (h)", y = "Semi-variance")+
   theme_bw()+ ggtitle("Gaussian")+
  theme(text = element_text(size =14))+
  theme(axis.title.x=element_blank(),
        axis.title.y=element_text(size=14),
        axis.text.x =element_blank(),
        axis.text.y =element_text(size=14))

## using locations of the sample to calculate the localized mean
universal<- krige(total~X+Y,dta, grd, model=FittedModel4)

```

Step 2: Visualize the results as we usually

```{r 10-universal-kriging-2, warning=FALSE, message=FALSE, fig.cap=fig_cap}
fig_cap <- paste0("Predicted basal area using universal kriging. Nepal, CC-BY-SA-4.0.")
#convert the kriging results to raster
raster_uk      <- raster(universal)
uk_clip    <- mask(raster_uk, cata)

# Plot the kriging result using the library tmap
tm_shape(uk_clip) + 
  tm_raster(n=8,palette = "RdBu", auto.palette.mapping = FALSE,
            title="Predicted basal area using universal kriging") + 
  tm_shape(dta) + tm_dots(size=0.2) +
  tm_legend(legend.outside=FALSE)
```

Predicted basal area using universal kriging.

Step 3: Cross validation

```{r 10-universal-kriging-3, warning=FALSE, message=FALSE}
cv_uK <- krige.cv(total~ X+Y, dta, model=FittedModel4, nfold=nrow(dta),
verbose=FALSE)
### calculate RMSE
res <- as.data.frame(cv_uK)$residual
sqrt(mean(res^2))
#### Mean residual
mean(res)
#### Mean squared deviation of the prediction VS the sample
mean(res^2/as.data.frame(cv_uK)$var1.var)
```

##  Which method is the best given our data? {-}

We will put the cross validation results together and cross compare across three methods.

**Table 10.3** Cross-validation results for different type of kriging.

```{r 10-universal-kriging-4, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
Method<-c("Simple", "Ordinary", "Universal")
RMSE<-c("60.42", "60.42","68.48")
ME<-c("-0.69","-0.69","-1.37")
MSDR<-c("5.25","5.25","10.97")
summary<-data.frame(Method,RMSE,ME,MSDR)
knitr::kable(summary, caption =NULL)
```

From the cross validation results (Table 10.3), we want root mean squared error (RMSE) to be low for greater predictive accuracy [@tziachris_spatial_2017]. We also want mean error (ME) to be as close to 0 as possible [@tziachris_spatial_2017]. And we want mean squared deviation Ratio (MSDR) to be closer to 1 for the good kriging model [@tziachris_spatial_2017]. The RMSE is lowest for simple and ordinary kriging, the ME is negative and below zero, and the MSDR of the predictions vs. the sample is low and closer to 1 for both simple and ordinary kriging (Table 10.3). This means the variability in predictions from both kriging are somewhat closer to real values than universal kriging [@tziachris_spatial_2017]. Looks like universal kriging is predicting more negative basal area for some of the location within TSA. The choice between simple and ordinary kriging may vary with researchers discretion here.

## Co-Kriging

Co-kriging uses the information about different covariates to predict the value of the variable of interest at an unsampled location [@cressie_statistics_1994]. It utilizes the autocorrelation of the variable of interest and the cross correlations between the variable of interest with all the covariates to make the prediction [@cressie_statistics_1994]. In order to implement co-kriging we need to have a strong correlation between the covariates [@tziachris_spatial_2017]. The spatial variability of one variable should be correlated with the spatial variability of the other covariates [@tziachris_spatial_2017].

## Non-Linear Kriging

In principle, nonlinear kriging algorithms are linear kriging algorithms applied to nonlinear transformations of the data points into a continuous variable [@deutsch_gslib_1993]. We will briefly talk about four principal non-linear kriging techniques in this chapter while our focus mostly is on linear kriging methods.

## Indicator Kriging

Indicator kriging is a non-parametric approach of estimating a binary variable (presence absence or variables that takes 0 or 1 value) of interest at an unsampled or unmeasured location [@journel_nonparametric_1983]. For example, we might have a sample that consists of information on presence or absence of  Douglas-fir tree species within Williams lake timber supply area, where 0 indicates absence and 1 indicates the presence of species. Indicator kriging assumes that mean is a unknown constant over the domain. The only difference between the indicator kriging and ordinary krigging is in the use of binary variable. The basic mathematical formulation of indicator kriging is given below:

$$I_{s}= \mu + \epsilon_{s}$$
where, $$I= \text{binary variable preicated at the location s}$$
$$\mu= \text{unknown mean}$$
$$\epsilon_{s}=\text{spatially autocorrelated error}$$

## Probability Kriging

Probability kriging is useful when the variable of interest is binary as in case of indicator kriging. It is a special form of co-kriging which estimate the conditional probability that the unknown value of a variable at an unsampled location is above a specified cutoff level [@carr_general_1993]. As in co-kriging, this method utilizes the autocorrelation of variable of interest and the cross correlations between the variable of interest with all the covariates to make the prediction [@carr_general_1993]

## Disjunctive Kriging

Disjunctive kriging allows to estimate the value of a variable of interest at an unsampled location and estimating the conditional probability that the unknown value of a variable at an unsampled location is above a specified cutoff level [@yates_disjunctive_1986]. Disjunctive kriging transforms the data into a normal distribution and then determine the probability that true value of variable at each location of interest exceeds the predefined threshold or cut-off probability [@daya_comparative_2015].

## Spatial Regression Models

For classical statistics tests, spatial autocorrelation is problematic as ordinary least square (regression) or analysis of variance (ANOVA) assumes that observations are independent in space and time [@meng_spatial_2009]. However, geostatistical data violates the assumptions of independence, and using regression and ANOVA might inflate the significance of t and F statistics, when, in fact, they may not be significant at all [@meng_spatial_2009]. In that case one should try to improve the regression model by adding important auxiliary (independent variables that are associated or important in predicting the variables of interest) and incorporating the spatial autocorrelation structure [@meng_spatial_2009] using spatial regression models [@anselin_spatial_1998]. The whole objective of spatial regression is to understand the association of the variable of interest with the independent variables while accounting for the spatial structure present in the data. We will show two examples of spatial regression model in this section using our familiar YSM data for Fort Saint Johns TSA. 

## Case Study 5 

For this case study, we will use ground plot data from Young stand monitoring (YSM) program data [@province_of_bc_provincial_2018] for Fort Saint Johns timber supply area (TSA) in the province of British Columbia, Canada. Fort Saint Johns is divided into 6 blocks respectively Figure \@ref(fig:10-FSJ-plot). There are a total of 108 YSM plots used in this study Figure \@ref(fig:10-FSJ-plot). The total basal area ($m^2/ha$) is our variable of interest or response variable in this study. For each of the YSM plot the total basal area was calculated by adding the basal area for all trees within the plot. We will use the auxiliary variables such as trees per hectare (TPH), elevation (m), site index, top height (m), and tree volume ($m^3/ha$)

## Spatial Lag Model 

Spatial lag models assume that the spatial autocorrelation only exist in the response variable or the variable of interest [@anselin_spatial_1998]. Spatial lag model has the following general mathematical formulations:

$$ y=  (\rho)WY + \beta X + \epsilon$$
where, $$y= \text{response variable or variable of interest}$$
$$\rho= \text{coefficients for the spatial weight matrix W}$$
$$\beta= \text{coefficients for the predictor variables}$$
$$WY=\text{spatially lagged response variable for the weight matrix W}$$
$$X=\text{matrix of observations for the predictor variables}$$
$$\epsilon=\text{vector of error terms}$$


## Steps in Fitting Spatial Lag Model:

Step 1: Build spatial weight matrix

We will create a spatial weight matrix using the distance-based approach using the nearest-neighbor approach. This spatial weight matrix will be used in our spatial lag model to account for the spatial autocorrelation. 

```{r 10-spatial-lag-model, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
YSM_pots <- read_sf(dsn="data/10",layer="YSM_point")
YSM.coords <- st_coordinates(YSM_pots) 
YSM.nb <- dnearneigh(YSM.coords, 0, 60000, longlat = FALSE) ## neighbors between between 0 km and 60 km distance 
YSM.W<-nb2listw(YSM.nb, style = "W", zero.policy = T) # Creates a matrix W object with zero policy=TRUE as some plot might not have any neighbors
```

Step 2: Check for the spatial autocorrelation

We will check the spatial autocorrelation using the Moran's I and Moran's test using our distance based spatial weight matrix [@getis_constructing_2010] we have just calculated.

```{r 10-spatial-lag-model-2, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
moran.test(YSM_pots$Basal, YSM.W)
```

Moran's I value was (I=0.065), which was significant (p-value < 0.05) indicting a positive spatial autocorrelation. It seems like a weak autocorrelation but for the purpose of demonstration in this section we will proceed further assuming there is a spatial autocorrelation.

Step 3: Fit a spatial lag model

In this step, we will fit a spatial lag model using basal area (m2/ha) as a response variable. While, trees per hectare (TPH), elevation (m), site index, top height(m), and tree volume (m3/ha) will be used as predictor variable.

```{r 10-spatial-lag-model-3, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
f1<- Basal~TPH+Elevation+Volume+Height+Site_index ## formula for basal area as response and auxiliary variables as predictor  
basal_lag<- lagsarlm(f1,data=YSM_pots,YSM.W,zero.policy = T)
summary(basal_lag)
```

Step 4: Select auxiliary variables and refit the model

Varieties of ways has been proposed to select the auxiliary variables to get the best spatial model. We will go through them briefly. First, we will Use the alpha=0.005 to check whether our auxiliary variables are significantly associated with the basal area. From the summary, we can see that TPH, Elevation and Volume has p-value < 0.05 indicating that they are significantly associated with basal area. 

```{r 10-spatial-lag-model-4, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
f2<- Basal~TPH+Elevation+Volume ## formula for basal area as response and auxillary variables as predictor  
basal_lag1<- lagsarlm(f2,data=YSM_pots,YSM.W,zero.policy = T)
summary(basal_lag1)
```

Step 5: Assess both models using Akaike's Information Criteria (AIC)

Sometime, only using the p-value to assess significant variables will not be useful while assessing which models best fits the data as multiple models can have potential to describe the association between response and predictor variables. In the context, when we have different competing models, we can use AIC [for details see, @akaike_information_1973] to compare the models. For example, suppose models we have fitted in step 3 and step 4 were competing and potential. We can select the best model between two with lowest AIC values. AIC value of model from step 3 is 867.44 while the AIC value of model from step 4 is 865.65, indicating later one is the best fit to our data.

Step 6: Interpreting rho coefficient for our selected model

Rho (0.16445), reflects the spatial dependence inherent in our sample data, measuring the average influence on observations by their neighboring observations. It has a positive effect and it is significant (p-value < 0.05). As a result, the general model fit improved over the linear model.

## Spatial Error Model 

Spatial error models assume that the spatial autocorrelation exists in the residuals or the error term of the regression equation [@anselin_spatial_1998]. The general mathematical formula for spatial error model is given below: 

$$ y=\beta X + \epsilon$$ 
$$\epsilon= \lambda (W)\epsilon+ u$$

where, $$y= \text{response variable or variable of interest}$$
$$\beta= \text{coefficients for the predictor variables}$$
$$\lambda= \text{coefficients for the spatial weight matrix W for spatially autocorrelated erros}$$

$$(W)\epsilon=\text{spatial weight matrix W}$$
$$X=\text{matrix of observations for the predictor variables}$$
$$u=\text{indepndent errors}$$

## Steps in Fitting Spatial Error Model:

Step 1 and Step 2 in fitting spatial error model is exactly similar to what we did for spatial lag model. 

Step 3: Fit the spatial error model

We will use the function errorsarlm from the package spatialreg to fit the error model indicating that we are accounting the spatial autocorrelation that exists in the residuals or the error terms instead of the response variable (basal area).

```{r 10-spatial-error-model, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
f3<- Basal~TPH+Elevation+Volume+Height+Site_index ## formula for basal area as response and auxiliary variables as predictor  
basal_lag2<- errorsarlm(f3,data=YSM_pots,YSM.W,zero.policy = T)
summary(basal_lag2)
```

Step 4: Select auxiliary variables and refit the model

We will use the alpha=0.05 to check whether our auxiliary variables are significantly associated with the basal area. From the summary, we can see that TPH, Elevation and Volume has p-value < 0.05 indicating that they are significantly associated with basal area. 

```{r 10-spatial-error-model-2, echo=FALSE, warning=FALSE, message=FALSE, tab.cap = FALSE}
f4<- Basal~TPH+Elevation+Volume ## formula for basal area as response and auxillary variables as predictor  
basal_lag3<- errorsarlm(f4,data=YSM_pots,YSM.W,zero.policy = T)
summary(basal_lag3)
```

Step 5: Assess both models using akaike's information criteria (AIC)

AIC value of model from step 3 is 868.83 while the AIC value of model from step 4 is 866.04, indicating later one is the best fit to our data.

Step 6: Interpret lambda parameter from the summary from our selected model

The lag error parameter Lambda for the model in step 4 is positive and significant (p-value < 0.05), indicating the need to control for spatial autocorrelation in the error

## Selection Between Lag and Error Model

When it is not so clear theoretically that either of the spatial model works for our data, we can compare the model performance parameters: the AIC and Log likelihood. In our case, the spatial error model has lowest AIC and highest negative Log likelihood values. Hence, spatial lag model best fits our data.

:::: {.box-content .call-out-content}

::: {.box-title .call-out-top}
## Remember This? {-}
:::

<p id="box-text"> When the question is which of the two models is better? This is an open question. The general advice is first to look for a theoretical basis to inform your choice. If there are strong substantive grounds for one model instead of the other, you should adopt it. </p>

::::

## Reflection Questions {-}

1. Explain probability and non-probability sampling.
2. Define spatial autocorrelation and semivariogram.
3. When do you use spatial contiguity vs. nearest neighbor?
4. What is spatial regression mostly used for? Estimation or Prediction?
---!>