Skip to content

Commit 76def00

Browse files
author
Karandeep Singh
committed
Updated the README with additional examples and an updated decision curve.
1 parent 25ab06b commit 76def00

13 files changed

+1392
-112
lines changed

NAMESPACE

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,9 @@
11
# Generated by roxygen2: do not edit by hand
22

3+
export("%>%")
4+
export(apply_all)
5+
export(apply_constraint)
6+
export(apply_threshold)
7+
export(calculate_net_benefit)
8+
import(dplyr)
9+
importFrom(magrittr,"%>%")

R/modelrecon.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ NULL
1818
#' @export
1919
#'
2020
#' @examples
21-
calculate_nb = function(data, verbose = FALSE) {
21+
calculate_net_benefit = function(data, verbose = FALSE) {
2222
data %>%
2323
mutate(n = n()) %>%
2424
filter(!is.na(met_threshold)) %>%
@@ -137,8 +137,10 @@ apply_all = function(data, threshold) {
137137
#' applies to the most recently set threshold. This allows [apply_threshold()]
138138
#' and `apply_constraint()` to be chained together multiple times using pipes.
139139
#'
140-
#' @param data
141-
#' @param capacity
140+
#' @param data A data frame resulting from [apply_threshold()]
141+
#' @param capacity A number specifying the capacity, where `0` means no capacity
142+
#' (the model cannot be acted upon), and `Inf` means infinite capacity (there
143+
#' is no constraint).
142144
#'
143145
#' @return A data frame with an altered set of predictions based on the
144146
#' specified capacity. For patients whose predictions were converted from

README.Rmd

Lines changed: 123 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@ The goal of modelrecon is to apply thresholds to predicted probabilities and cal
1919

2020
## Installation
2121

22-
You can install the released version of modelrecon from [CRAN](https://CRAN.R-project.org) with:
22+
You can install the released version of modelrecon from GitHub with:
2323

2424
``` r
25-
install.packages("modelrecon")
25+
remotes::install_github('ML4LHS/modelrecon')
2626
```
2727

2828
## Get started
@@ -32,115 +32,165 @@ Let's load the package and generate and example dataset containing the probabili
3232
```{r example}
3333
library(modelrecon)
3434
library(dplyr)
35+
library(tidyr)
36+
library(ggplot2)
3537
3638
example_data = data.frame(probability = c(0.8, 0.7, 0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.05, 0.01),
3739
outcome = c(T, T, F, T, T, F, F, F, T, F))
3840
```
3941

40-
What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
4142

42-
```{r cars}
43+
## What does our example dataset look like?
44+
45+
```{r}
46+
example_data
47+
```
48+
49+
50+
## Let's apply a threshold of 0.2
51+
52+
This means we will call all predictions with a probability >= 0.2 as `TRUE`.
53+
54+
```{r}
55+
example_data %>%
56+
apply_threshold(0.2)
57+
```
58+
59+
60+
## Let's calculate a net benefit with a threshold of 0.2
61+
62+
63+
```{r}
4364
example_data %>%
44-
apply_all(0.2) %>%
45-
calculate_nb(verbose = TRUE)
65+
apply_threshold(0.2) %>%
66+
calculate_net_benefit()
67+
```
68+
69+
## What is going on behind the scenes?
70+
71+
Behind the scenes, the `calculate_net_benefit()` function is calculating the number of true and false positives, and then using that along with the previously applied threshold to calculate the net benefit.
72+
73+
### How did `calculate_net_benefit()` know about the threshold?
74+
75+
This information is captured in the `thresholds` attribute.
4676

77+
```{r}
4778
example_data %>%
4879
apply_threshold(0.2) %>%
49-
calculate_nb()
80+
attributes() %>%
81+
.$thresholds
82+
```
83+
### Want more information about the number of true and false positives?
84+
85+
Set the `verbose` argument of `calculate_net_benefit()` to `TRUE`. This will print, *not* return, a data frame with the information it used to calculate the net benefit. The value returned is still the net benefit.
5086

87+
```{r}
88+
example_data %>%
89+
apply_threshold(0.2) %>%
90+
calculate_net_benefit(verbose = TRUE)
91+
```
92+
93+
## What happens when you apply an absolute constraint?
94+
95+
Two of the five predicted `TRUE` values are converted to `FALSE` because only the first 3 `TRUE` values (those with the highest predicted probability) are able to be acted upon.
96+
97+
```{r}
98+
example_data %>%
99+
apply_threshold(0.2) %>%
100+
apply_constraint(3)
101+
```
102+
103+
104+
## Calculate a realized net benefit with a threshold of 0.2 and an capacity of 3
105+
106+
This is an example of an *absolute* constraint.
107+
108+
```{r}
51109
example_data %>%
52110
apply_threshold(0.2) %>%
53111
apply_constraint(3) %>%
54-
calculate_nb()
112+
calculate_net_benefit(verbose = TRUE)
55113
114+
```
56115

116+
117+
## Calculate a realized net benefit with an absolute threshold of 0.2 and capacity of 3, and *then* a relative constraint of 0.5:
118+
119+
```{r}
57120
example_data %>%
58121
apply_threshold(0.2) %>%
59122
apply_constraint(3) %>%
60123
apply_threshold(0.5) %>%
61-
calculate_nb()
124+
calculate_net_benefit(verbose = TRUE)
125+
```
62126

127+
## The default assumption when we set a threshold without a subsequent constraint is that the capacity is infinite.
63128

64-
bind_rows(
65-
example_data %>%
129+
You can also explicitly note the infinite capacity, which will be applied only to the immediate prior threshold.
130+
131+
```{r}
132+
example_data %>%
133+
apply_threshold(0.2) %>%
134+
apply_constraint(3) %>%
135+
apply_threshold(0.5) %>%
136+
apply_constraint(Inf) %>%
137+
calculate_net_benefit()
138+
```
139+
140+
141+
Using this mechanism, you can construct multiple layers of absolute and relative constraints as the piped functions retain metadata about prior constraints and thus know that each constraint applies to only the prior threshold.
142+
143+
You *cannot* apply a threshold that is *lower* than a prior threshold because it would make no sense to apply a permissive criterion *before* a more restrictive one.
144+
145+
## Setting a new threshold that is lower than the prior one will generate an error.
146+
147+
```{r error = TRUE}
148+
example_data %>%
66149
apply_threshold(0.5) %>%
67150
apply_constraint(3) %>%
68-
calculate_nb(),
69-
example_data %>%
70-
apply_threshold(0.6) %>%
71-
apply_constraint(3) %>%
72-
calculate_nb(),
73-
example_data %>%
74-
apply_threshold(0.7) %>%
75-
apply_constraint(3) %>%
76-
calculate_nb(),
77-
example_data %>%
78-
apply_threshold(0.8) %>%
79-
apply_constraint(3) %>%
80-
calculate_nb()
81-
)
151+
apply_threshold(0.2) %>%
152+
calculate_net_benefit()
153+
```
154+
155+
156+
## Setting a new threshold that is the same as a prior one will generate a warning.
82157

158+
In a future version, this may be upgraded to an error.
159+
160+
```{r error = TRUE}
83161
example_data %>%
84-
apply_threshold(0.6) %>%
162+
apply_threshold(0.2) %>%
85163
apply_constraint(3) %>%
86-
apply_threshold(0.5) %>%
87-
apply_constraint(Inf) %>%
88-
calculate_nb()
164+
apply_threshold(0.2) %>%
165+
calculate_net_benefit()
166+
```
89167

90-
# Vary absolute constraint
168+
# Let's plot a decision curve for an absolute constraint, and an absolute + relative constraint
91169

170+
```{r, warning = FALSE}
92171
plot_data =
93172
expand_grid(constraint = c(0, 1, 3, 5, 7, Inf),
94173
threshold = seq(from = 0, to = 1, by = 0.05)) %>%
95-
# example_data %>%
96-
# distinct(probability) %>%
97-
# add_row(probability = 1) %>%
98-
# pull(probability)) %>%
99174
group_by(constraint, threshold) %>%
100175
mutate(net_benefit = example_data %>%
101176
apply_threshold(threshold) %>%
102177
apply_constraint(constraint) %>%
103-
# apply_threshold(0.5) %>%
104-
calculate_nb()) %>%
178+
calculate_net_benefit()) %>%
105179
ungroup()
106180
107-
plot_data %>%
108-
mutate(constraint = as.factor(paste('Capacity =',constraint))) %>%
109-
ggplot(aes(x = threshold, y = net_benefit)) +
110-
geom_line() +
111-
facet_wrap(~constraint) +
112-
coord_cartesian(ylim = c(0, 0.5)) +
113-
theme_bw() +
114-
labs(x = 'Threshold probability',
115-
y = 'Realized net benefit')
116-
117-
# Vary absolute constraint and add relative constraint
181+
# Vary absolute constraint and add relative constraint (up to threshold of 0.5)
118182
119183
plot_data_2 =
120184
expand_grid(constraint = c(0, 1, 3, 5, 7, Inf),
121-
threshold = seq(from = 0, to = 1, by = 0.05)) %>%
122-
# example_data %>%
123-
# distinct(probability) %>%
124-
# add_row(probability = 1) %>%
125-
# pull(probability)) %>%
185+
threshold = seq(from = 0, to = 0.5, by = 0.05)) %>%
126186
group_by(constraint, threshold) %>%
127187
mutate(net_benefit = example_data %>%
128188
apply_threshold(threshold) %>%
129189
apply_constraint(constraint) %>%
130190
apply_threshold(pmax(0.5, threshold)) %>%
131-
calculate_nb()) %>%
191+
calculate_net_benefit()) %>%
132192
ungroup()
133193
134-
plot_data_2 %>%
135-
mutate(constraint = as.factor(paste('Capacity =',constraint))) %>%
136-
ggplot(aes(x = threshold, y = net_benefit)) +
137-
geom_line() +
138-
facet_wrap(~constraint) +
139-
coord_cartesian(ylim = c(0, 0.5)) +
140-
theme_bw() +
141-
labs(x = 'Threshold probability',
142-
y = 'Realized net benefit')
143-
144194
bind_rows(
145195
plot_data %>% mutate(constraint_type = 'Absolute constraint'),
146196
plot_data_2 %>% mutate(constraint_type = paste0('Absolute constraint\n',
@@ -150,8 +200,11 @@ bind_rows(
150200
) %>%
151201
mutate(constraint = if_else(constraint == Inf, 'Infinity', as.character(constraint))) %>%
152202
mutate(constraint = as.factor(paste('Capacity =',constraint))) %>%
153-
filter(constraint == 'Capacity = 3') %>%
154-
filter(threshold == 0.2) ->
203+
filter((constraint == 'Capacity = 3' & threshold == 0.2) |
204+
(constraint == 'Capacity = Infinity' & threshold == 0.2)) %>%
205+
slice(1:3) %>%
206+
mutate(text = c('Case study 2', 'Case study 1', 'Case study 3')) %>%
207+
mutate(x = c(0.3, 0.4, 0.3), y = c(0.05, 0.4, 0.33)) ->
155208
point_data
156209
157210
bind_rows(
@@ -167,56 +220,17 @@ bind_rows(
167220
linetype = constraint_type)) +
168221
geom_line() +
169222
geom_point(data = point_data) +
223+
geom_text(data = point_data,
224+
aes(label = text, x = x, y = y), size = 3) +
170225
facet_wrap(~constraint) +
171226
coord_cartesian(ylim = c(0, 0.5)) +
172227
theme_bw() +
228+
theme(axis.text = element_text(size = 6)) +
173229
labs(x = 'Threshold probability',
174230
y = 'Realized net benefit',
175231
linetype = 'Constraint')
176232
177-
ggsave('decision_curves_with_constraints_20221025.pdf',
178-
width = 6.5, height = 4, units = 'in')
179-
180-
set.seed(1)
181-
all_nbs = numeric(1000)
182-
for (i in 1:1000) {
183-
all_nbs[i] =
184-
example_data %>%
185-
apply_all() %>%
186-
dplyr::sample_frac(1) %>%
187-
apply_constraint(3) %>%
188-
calculate_nb(0.2)
189-
}
190-
mean(all_nbs)
191-
192-
193-
example_data2 = data.frame(prediction = c(T, T, T, T, T, F, T, F, T, T),
194-
outcome = c(T, T, F, T, T, F, F, F, T, F))
195-
196-
example_data2 %>%
197-
calculate_nb(0.2)
198-
199-
200-
201-
set.seed(2)
202-
all_nbs = numeric(1000)
203-
for (i in 1:1000) {
204-
all_nbs[i] =
205-
example_data2 %>%
206-
dplyr::sample_frac(1) %>%
207-
apply_constraint(3) %>%
208-
calculate_nb(0.2)
209-
}
210-
mean(all_nbs)
233+
# ggsave('Figure 2.pdf',
234+
# width = 6.5, height = 4, units = 'in')
211235
212236
```
213-
214-
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date.
215-
216-
You can also embed plots, for example:
217-
218-
```{r pressure, echo = FALSE}
219-
plot(pressure)
220-
```
221-
222-
In that case, don't forget to commit and push the resulting figure files, so they display on GitHub!

0 commit comments

Comments
 (0)