-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathREADME.Rmd
258 lines (174 loc) · 10.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-figs/"
)
```
# maxcovr
<!-- badges: start -->
[![R-CMD-check](https://github.com/njtierney/maxcovr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/njtierney/maxcovr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/njtierney/maxcovr/graph/badge.svg)](https://app.codecov.io/gh/njtierney/maxcovr)
<!-- badges: end -->
maxcovr was created to make it easy for a non expert to correctly solve the maximum covering location problem described by [Church](http://www.geog.ucsb.edu/~forest/G294download/MAX_COVER_RLC_CSR.pdf). Implementations of this problem (such as [optimimum AED placement](https://www.resuscitationjournal.com/article/S0300-9572(18)30065-0/fulltext)) may use commercial software such as AMPL, Gurobi, or CPLEX, which require an expensive license, or use open source licenses but not provide source code to the analysis performed (e.g., [Bonnet 2014 ](http://www.sciencedirect.com/science/article/pii/S0360835215003927)) This builds a substantial barrier to implement and reproduce these analyses.
maxcovr was created to make results easy to implement, reproduce, and extend by using:
- R, a free and open source language
- Open source solvers, glpk and lpSolve, that can be used on Linux, Windows, and OSX.
- Real-world, open source example data.
- Tidyverse principles to make it easy to use and reason with.
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
# How to Install
Install the development version of `maxcovr` from [r-universe](http://njtierney.r-universe.dev/):
```r
install.packages("maxcovr", repos = c("https://njtierney.r-universe.dev", "https://cloud.r-project.org"))
```
(Note - installing from r-universe is just like installing from CRAN, and should be faster and more convenient than installing from GitHub)
Or install using `remotes`:
```{r install, eval = FALSE}
# install.packages("remotes")
remotes::install_github("njtierney/maxcovr")
```
# Using maxcovr
_Disclaimer: The following is a fictitious example using real world data._
Consider the toy example where we are playing a tower defense game and we need to place crime surveillance towers to detect crime.
We have two datasets, `york`, and `york_crime`:
- `york` contains listed building GPS locations in the city of York, provided by the city of york
- `york_crime` contains a set of crime data from the [`ukpolice` package](https://www.github.com/njtierney/ukpolice), containing crime data from September 2016.
In this game we already have a few towers built, which are placed on top of the listed buildings with a grade of I. We will call this dataset `york_selected`, and the remaining building locations `york_unselected`
```{r york-towers}
library(maxcovr)
library(dplyr)
# subset to be the places with towers built on them.
york_selected <- york %>% filter(grade == "I")
york_unselected <- york %>% filter(grade != "I")
```
The purpose of the game is to build towers in places so that they are within 100m of crime. We are going to use the crime data that we have to help us choose ideal locations to place towers.
This can be illustrated with the following graphic, where the red circles indicate the current coverage of the building locations, so those blue crimes within the circles are within the coverage.
```{r leaflet}
library(leaflet)
leaflet() %>%
addCircleMarkers(data = york,
radius = 1,
color = "steelblue") %>%
addCircles(data = york_selected,
radius = 100,
stroke = TRUE,
fill = NULL,
opacity = 0.8,
weight = 2,
color = "coral") %>%
addProviderTiles("CartoDB.Positron") %>%
setView(lng = median(york$long),
lat = median(york$lat),
zoom = 15)
```
Currently the coverage looks alright, but let's verify the coverage using the `nearest` function. `nearest` takes two dataframes and returns the nearest lat/long coords from the first dataframe to the second dataframe, along with the distances between them and the appropriate columns from the building dataframe.
```{r print-york-selected-nearest-crime}
dat_dist <- york_selected %>% nearest(york_crime)
head(dat_dist)
```
You can instead return a dataframe which has every building in the rows, and the nearest crime to the building, by simply changing the order.
```{r print-york-crime-nearest-selected}
dat_dist_bldg <- york_crime %>% nearest(york_selected)
head(dat_dist_bldg)
```
To evaluate the coverage we can use `coverage`. This reads as find the coverage of the york buildings (below) to the crimes. Coverage of the first thing on the second thing. Or, how many of the second thing are covered by the first thing.
```{r coverage-fun}
coverage(york_selected, york_crime)
```
This tells us that out of all the crime, 18.68% of it is within 100m, 339 crimes are covered, but the mean distance to the surveillance camera is 1400m.
## Maximising coverage
Say then we want to add another 20 surveillance towers, but we want to use the best 20, we use `max_coverage`.
```{r show-time-taken-maxcovr}
system.time(
# mc_20 <- max_coverage(A = dat_dist_indic,
mc_20 <- max_coverage(existing_facility = york_selected,
proposed_facility = york_unselected,
user = york_crime,
n_added = 20,
distance_cutoff = 100)
)
```
`max_coverage` actually returns a dataframe of lists.
```{r print-mc-20}
mc_20
```
This is handy because it means that later when you want to explore multiple `n_added`, say you want to explore how coverage changes for 20, 40, 60, 80, 100 `n_added`, then these are added as rows in the dataframe, which makes it easier to do summaries and manipulate.
Important features here of this dataframe are:
- `facility_selected`: A dataframe from `proposed_facilities`, containing the facilities selected by the optimisation.
- `user_affected`: A dataframe from `user`, that contains the users that were affected by the new optimised placement
- `model_coverage`: A dataframe containing summary info on the number of users covered, the percentage of coverage, and the average distance.
- `existing_coverage`: returns a similar summary dataframe the original coverage, from `existing_facilities`.
- `summary`: returns the binded `model_coverage` and `existing_coverage`.
- `n_added`: The number of things added
- `distance_cutoff`: the distance cutoff selected
One can also use `map` from `purrr` to fit many different configurations of `n_added`.
(Future work will look into allowing `n_added` to take a vector of arguments).
```{r run-many-maxcovr}
library(purrr)
n_add_vec <- c(20, 40, 60, 80, 100)
system.time(
map_mc_model <- map_df(.x = n_add_vec,
.f = ~max_coverage(existing_facility = york_selected,
proposed_facility = york_unselected,
user = york_crime,
distance_cutoff = 100,
n_added = .))
)
```
This returns a list of dataframes, which we can bind together like so:
```{r bind-results-rows}
map_cov_results <- bind_rows(map_mc_model$model_coverage)
```
We can then visualise the effect on coverage:
```{r vis-coverage}
library(ggplot2)
bind_rows(map_mc_model$existing_coverage[[1]],
map_cov_results) %>%
ggplot(aes(x = factor(n_added),
y = pct_cov)) +
geom_point() +
geom_line(group = 1) +
theme_minimal()
```
You can read more about the use of `max_coverage`, covering topics like cross validation in the vignette.
# Known Issues
- `max_coverage()` may take a bit of time to run, depending on your data size. From initial testing, if the product of the number of rows of the `proposed_facilities` and `users` exceeds 100 million, it might take more than 1 minute. Of course, this may depend on the structure / complexity of your data and problem.
- The distances calculated in `maxcovr` use [haversines formula](https://en.wikipedia.org/wiki/Haversine_formula), which makes the assumption that the earth is a sphere and calculates the greater circle distance. Whilst not wholly correct, haversine is a useful approximation that is reasonable for small scale distances, where the accuracy can be within metres which is what `maxcovr` was initially built for. In the future `maxcovr` will use more accurate distance functions provided in [`geodist`](https://github.com/hypertidy/geodist), and give the user control over the distance calculation used (haversines, vincenty, cheap ruler, geodesic, etc).
# Future Work
I will be focussing on making `maxcovr` work well within the `tidyverse`. This includes providing sensible standard summaries using key function verbs from `broom`, adding exploratory plots, improving speed using Rcpp, and allowing users to select the solver they want to use.
```{r testing, echo = FALSE}
#
# n_add_vec <- c(20,40)
#
# system.time(
# mc_cv_fit_many <-
# map2_df(mc_cv$train, # training set goes here
# n_add_vec,
# .f = ~max_coverage,
# existing_facility = york_selected,
# proposed_facility = york_unselected,
# distance_cutoff = 100)
# )
#
#
# system.time(
# mc_cv_fit_many <-
# pmap(.l = list(user = mc_cv$train, # training set goes here
# n_added = n_add_vec),
# .f = ~max_coverage,
# existing_facility = york_selected,
# proposed_facility = york_unselected,
# distance_cutoff = 100)
# )
```
If you have any suggestions, please [file an issue](http://www.github.com/njtierney/maxcovr/issues/new) and I will get to it as soon as I can.
# Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md).
By participating in this project you agree to abide by its terms.
# Acknowledgements
Oliver Czibula, for this initial help in helping me understand the Maximum Covering Location Problem. Angelo Auricchio and Antonietta Mira for their supervision. Alessio Quaglino and Jost Reinhold for their help in implementing the first implementation of this problem in lpSolve. Martin Weiser for his suggestion for the relocation process. Matt Sutton for his very thoughtful explanations on how to interact with high level solvers, which led to implementing maxcovr in lpsolve, glpk, and gurobi. And Alex Simmons, for teaching me how to write better C++ code.