-
Notifications
You must be signed in to change notification settings - Fork 0
/
ocs-opioid-medicare.Rmd
1267 lines (1108 loc) · 53.6 KB
/
ocs-opioid-medicare.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: OpenCaseStudies - Opioid (Medicare)
output:
html_document:
theme: cosmo
code_download: yes
toc: true
toc_float: true
highlight: tango
number_sections: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(include = TRUE, comment = NA,
message = FALSE, warning = FALSE, echo = TRUE)
```
```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('./img/opioidmap.png')
```
`![]("gganim.gif")` needs to be fixed.
# Motivation
Opioid prescription has become a popular topic because of
the increasing prescription and concerns about overdose,
opioid-related deaths, and other opioid-associated side
effects. (Ref: [Paulozzi et al. 2011](https://www.ncbi.nlm.nih.gov/pubmed/22048730),
[Rudd et al. 2016](https://www.ncbi.nlm.nih.gov/pubmed/28033313),
[Rudd et al. 2016](https://www.ncbi.nlm.nih.gov/pubmed/26720857),
[Solomon et al. 2010](https://www.ncbi.nlm.nih.gov/pubmed/21149754),
[Dunn, et al. 2010](https://www.ncbi.nlm.nih.gov/pubmed/20083827). )
Our motivation is to understand the opioid prescription in the States
using publicly available data. Thus, the main questions we are going to ask
include:
(1) Among the 50 States and District of Columbia,
which five States had the highest and lowest number in
(a) **Total opioid claims** ,
(b) **Opioid prescription rate**
( = total opioid claims / total claims ) , and
(c) **Opioid prescription per 100 enrollments** ?
(2) Did the total number of opioid prescriptions change over time?
(3) Can you visualize the opioid prescription rate on the U.S map?
In this case study, we'll walk you through collecting data,
importing data, wrangling data, and visualizing the data,
using the well-established and commonly used packages, including
`kableExtra`, `readr` , `tidyverse` , `datasets` , `ggplot2` ,
`scales` , `ggrepel` , `choroplethr` , and `choroplethrMaps` .
Here we provide an overview aobut these packages
before we starts:
|R package|What can the package provide?|
|---|--------------------------------------------------------------------------------------------------------------|
|`kableExtra`|Helps with building complex tables and manipulating table styles; creates awesome HTML tables|
|`readr`|Allows you to read data from various formats faster|
|`tidyverse`|Provides user-friendly data manipulation|
|`datasets`|Gives us useful information, including the States' information|
|`ggplot2`|Improves data visualisations|
|`scales`|Adds the flexibility of scaling|
|`ggrepel`|Avoids superimposing the legends you don't expect|
|`choroplethr`|Helps preparing for plotting the U.S. Map|
|`choroplethrMaps`|Helps us plot the U.S. Map|
# What is the data?
Recently, the Centers for Medicare & Medicaid Services (CMS) has
prepared a public dataset, including information in the part D covering
calendar years 2013 through 2016, to make our health care system more transparent,
and to allow citizens to understand the medical expenditure and
relevant health issues. You can find the description of the data here:
[Medicare part D dataset](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Prescriber_Methods.pdf)
```{r out.width = "95%", echo = FALSE, out.width='90%'}
#knitr::include_graphics("https://aspe.hhs.gov/system/files/images-reports-basic/70441/fig1.jpg")
```
## Medicare data (Part D)
Medicare Part D is prescription drug coverage plan
run by Center for Medicare & Medicaid Services (CMS).
[Here is a website describing the information about Part D](https://www.medicare.gov/drug-coverage-part-d)
It was originally proposed by President Bill Clinton in 2000
based on earlier proposals developed by Congresswoman
Nancy Pelosi and Senator Tom Daschle, and started in
January 2006. There are two ways for participants to
enroll in Medicare part D:
1. Medicare Advantage prescription drug plans (MA-PDs), which
comes along with other Medicare benefits (ex: Part A or Part B)
2. Stand-alone prescription drug plan (PDP)
That means that for those who have already enrolled in
Medicare (either >= 65 years old or having certain diseases
such as end-stage-renal-disease or other disabilities), they
are eligible to join the Part D.
([For more information on eligibility over 65 years old, click here](https://www.hhs.gov/answers/medicare-and-medicaid/who-is-elibible-for-medicare/index.html)
and [for Medicare eligibility for those under 65, click here](https://www.medicareinteractive.org/get-answers/medicare-basics/medicare-eligibility-overview/medicare-eligibility-for-those-under-65)). For people enrolled
in other insurance plan, they may also be eligible to apply
for the plan. ([For more information on how Medicare works
with other types of insurance, click here](https://www.medicare.gov/drug-coverage-part-d/how-part-d-works-with-other-insurance)).
### Opioid information
First, let's download the opioid prescriber data from
the years 2013-2016, which is available from the CMS
website as an application programming interface (API).
To do this, you can type "opioid" in
the [CMS website.](https://data.cms.gov/browse?q=opioid).
The following is what you will see:
```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('./img/information01.PNG')
```
Now you can click on the "Medicare Part D Opioid Prescriber Summary File 2013", and
you will be directed to the next page:
```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('./img/information02.PNG')
```
Further, you can click on the "Export", and choose the "csv",
you will be guided to the download link. This is how we found
the api link for downloading the data. The screen shot is shown
as below:
```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('./img/information03.PNG')
```
```{r}
if(!file.exists("./data/2016prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/6wg9-kwip/rows.csv"
download.file(file_link,
destfile = "./data/2016prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2015prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/6i2k-7h8p/rows.csv"
download.file(file_link,
destfile = "./data/2015prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2014prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/e4ka-3ncx/rows.csv"
download.file(file_link,
destfile = "./data/2014prescriber.csv",mode = "wb")
}
if(!file.exists("./data/2013prescriber.csv")){
file_link <- "https://data.cms.gov/api/views/yb2j-f3fp/rows.csv"
download.file(file_link,
destfile = "./data/2013prescriber.csv",mode = "wb")
}
```
There is a description about the variables.
It can be found in the same page
[mentioned above.](https://data.cms.gov/Medicare-Part-D/Medicare-Part-D-Opioid-Prescriber-Summary-File-201/yb2j-f3fp)
Here is a screenshot showing how you can find the information.
```{r echo=FALSE, out.width='100%'}
knitr::include_graphics('./img/information04.PNG')
```
For your convenience, we have saved it as a csv
file (`prescriberopioiddescription.csv`) so that you can
understand the meanings of the variables easily.
This file describes what data is contained in each column.
We use the commands in the `kableExtra` package to show the file to
give us a nice display of the table in the windows.
The files we just downloaded contain the following information:
```{r}
library(kableExtra)
library(tidyverse)
partdinfodescription = read.csv("./doc2/prescriberopioiddescription.csv")
names(partdinfodescription)[1] <- "Column Name"
partdinfodescription %>%
select(`Column Name`,Description) %>%
kable() %>%
kable_styling() %>%
column_spec(1, bold = T, border_right = T, background = "white") %>%
column_spec(2, width = "30em", background = "lightyellow")
```
# Data Import
At this stage, you've downloaded your data from
CMS API successfully. Now, what you need is to
`load` or `read` these data in R.
There are several base R functions that allow you
read your data into R, which you may be familiar
with such as `read.table()`, `read.csv()`,
and `read.delim()`. Instead of using these,
we will use the functions in the
[readr](https://readr.tidyverse.org/articles/readr.html)
R package. The main reasons for this are
1. Compared to equivalent base R functions, the
functions in `readr` are around 10x faster.
2. You can specify the column types (e.g
character, integer, double, logical, date,
time, etc)
3. All parsing problems are recorded in
a data frame.
The main functions in `readr` include:
`readr` functions | Description |
--- | ---------------------------------------------------------------------------------------- |
`read_delim()` | reads in a flat file data with a given character to separate fields |
`read_csv()` | reads in a CSV file |
`read_tsv()` | reads in a file with values separated by tabs |
`read_lines()` | reads only a certain number of lines from the file |
`read_file()` | reads a complete file into a string |
`write_csv()` | writes data frame to CSV |
A useful cheatsheet for the functions in the
`readr` package can be found on RStudio's website:
![](https://www.rstudio.com/wp-content/uploads/2018/08/data-import.png)
```{r}
library(readr)
prescription_2016 <- read_csv("./data/2016prescriber.csv")
prescription_2015 <- read_csv("./data/2015prescriber.csv")
prescription_2014 <- read_csv("./data/2014prescriber.csv")
prescription_2013 <- read_csv("./data/2013prescriber.csv")
```
# Data Wrangling
Combining the functions in the packages `dplyr` and
`kableExtra` can help us take a glance at the data.
Here, we use the `mutate` to generate a new character
variable named `NPI` and use the `kable()` and
`kable_styling` to make the display clearer.
```{r}
prescription_2016 %>%
mutate(NPI = as.character(NPI)) %>%
head(.) %>%
kable() %>%
kable_styling()
```
We can see that the data is structured as provider-based level,
meaning that the information in each role is for the specific
health care provider. In this project, we're going to do the
analysis at State level, so you can probably imagine that we
need to have some process to get State-level information from
this data. Before moving there, let's take a look at
how big the data is for the 2016 information:
```{r}
dim(prescription_2016)
```
Wow! There are `r nrow(prescription_2016)` health providers
listed in this file. Of note, not every health provider had
prescribed opioid in the year 2016.
## 1. Add State information
In many cases, we want to add basic information about States,
such as State name, Region, and State abbreviation
in our dataset. We can get this information easily using
the dataset `state` in the library `datasets`. However,
before taking advantage of the `state`, we need to take a look
at our data, and have an idea about whether we can adopt
the information in `state` directly.
#### How many States are there in the 2016 prescription dataset?
```{r}
unique(prescription_2016$`NPPES Provider State`)
```
There are 61 values for `NPPES Provider State`.
From the [manual](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Prescriber_Methods.pdf),
we know that other than the fifty States and DC, there are
the other 10 values as following:
Values for `NPPES Provider State` | Description |
--- | --- |
`XX` | Unknown |
`AA` | Armed Forces Central/South America |
`AE` | Armed Forces Europe |
`AP` | Armed Forces Pacific |
`AS` | American Samoa |
`GU` | Guam |
`MP` | Northern Mariana Islands |
`PR` | Puerto Rico |
`VI` | Virgin Islands |
`ZZ` | Foreign Country |
#### Add the information (State name, region, and abbreviation) to the `state` in the `datasets`
```{r}
library(datasets)
data(state)
unique(state.name) # There are only 50 States
state.abb <- c(state.abb, "DC",
"XX",
"AA",
"AE",
"AP",
"AS",
"GU",
"MP",
"PR",
"VI",
"ZZ")
state.region <- as.factor(c(as.character(state.region), "South",
rep("Others",10)))
state.name <- c(state.name, "District of Columbia",
"Unknown",
"Armed Forces Central/South America",
"Armed Forces Europe",
"Armed Forces Pacific",
"American Samoa",
"Guam",
"Northern Mariana Islands",
"Puerto Rico",
"Virgin Islands",
"Foreign Country")
states <- unique(prescription_2016$`NPPES Provider State`) %>% data.frame(.)
names(states) <- "NPPES Provider State"
states$state_name <- state.name[match(tolower(states$`NPPES Provider State`), tolower(state.abb))]
states$region <- state.region[match(tolower(states$`NPPES Provider State`), tolower(state.abb))]
states %>%
kable() %>%
kable_styling()
```
## 2. Add population information for each State
Since it' reasonable to have more opioid claims simply
due to more participants enrolled in the Part D,
we may want to compare the opioid prescription per 100
enrollments instead of total number of opioid prescriptions
across States. Thus, one of the questions we are interested in
is what's the distribution of opioid prescription per
100 enrollments by States. To answer our question,
other than the opioid claim information we've already got,
we also need the information about how many participants
enrolled in the Part D plan for each State. To the best of
our knowledge, we can get the 2015 enrollment information from
the publicly available
[CMS website](https://www.cms.gov/newsroom/press-releases/its-50th-anniversary-more-55-million-americans-covered-medicare).
(Note: The information is based on 2015.
Please let us know if there are year-by-year
State-level Part D enrollment information,
so that we can update this analysis.)
```{r}
population <- read_csv("./doc2/enrollnumber.csv") %>%
rename( state_name = State,
enrollment_num2015 = Total)
```
Now, we can add the number of enrollments in
Part D by State from `population` to the
States information dataset `states`.
Note that there is some difference in the
names of States between `population` and
`states`. They are "Pending State Designation"
and "Foreign and Other Outlying Areas". Thus,
in the further analysis, we can consider if we
want to add them in or exclude them.
```{r}
population$state_name
```
Now, we can combine the population information with
the `state`, and save it as a new data frame called
`populationstate`
```{r}
populationstate <- states %>%
left_join(.,population, by = c("state_name"))
```
Let's see what the `populationstate` looks like.
```{r}
populationstate %>%
head() %>%
kable() %>%
kable_styling()
```
#### Can you tell how many States do we have the information about number of enrollments in Part D ?
```{r}
sum(!is.na(populationstate$enrollment_num2015))
```
## 3. Glimpse at opioid information
In the dataset `prescription_2016`, we have information about
total claim count, opioid claim count, and extended-release opioid
claims. For this project, we only need to look at opioid claim count and
total claim count.
```{r}
sum(is.na(prescription_2016$`Total Claim Count`))
sum(is.na(prescription_2016$`Opioid Claim Count`))
```
We noticed that there are some missing values due in `Opioid Claim Count`.
This tells us that not every provider included in this dataset
is eligible to prescribe opioid. Thus, in the next step, when we
want to calculate the opioid prescription rate by State, we need to limit our
data to those providers with both total claim information and
opioid claim information.
## 4. Calculate the opioid prescription rate
Great! Now we have better idea about our dataset, and
we need to create a dataset containing the information
we need to answer our questions. In other words, we
want to create a dataset for each year with following
information:
Variable Name | Description |
-------- | ------------------------------------------------------ |
NPPES Provider State | State Abbreviation |
total_opioid_claim | Total opioid claims (Part D) in the State |
total_claim_count | Total claims (Part D) in the State |
state_opioid_prescribing_rate | The proportion of total opioid claims versus total claims (Part D) in the State |
state_name | State Name |
region | The region of the State |
enrollment_num2015 | Number of Part D enrollments in 2015 |
total_claim_bypop | Total claims per 100 enrollment (*) |
opioid_claim_bypop | Opioid claims per 100 enrollment (*) |
year | year of the data came from |
(*) These numbers are valid only for year 2015.
Since we're going to use the same chuck of codes many
times, we can write a function named `anadata` to save our times!
```{r}
anadata <- function(data = data, year = year){
c <- year
data %>%
left_join(.,populationstate,by = c("NPPES Provider State")) %>%
select(`NPPES Provider State`,
state_name,region,
`Opioid Claim Count`,
`Total Claim Count`) %>%
filter(is.na(`Total Claim Count`) +
is.na(`Opioid Claim Count`) == 0 ) %>%
group_by(`NPPES Provider State`) %>%
summarise(total_opioid_claim = sum(`Opioid Claim Count`),
total_claim_count = sum(`Total Claim Count`)) %>%
mutate(state_opioid_prescribing_rate =
(total_opioid_claim/total_claim_count)*100) %>%
arrange(desc(state_opioid_prescribing_rate)) %>%
left_join(.,populationstate,by = c("NPPES Provider State")) %>%
mutate(total_claim_bypop =
(total_claim_count/enrollment_num2015)*100,
opioid_claim_bypop =
(total_opioid_claim/enrollment_num2015)*100) %>%
filter(!is.na(state.name)) %>%
mutate(year = c) -> x;
names(x) = tolower(names(x)); # We change the variable names to lowercase.
return(x)
}
prescriber2016byState <- anadata(prescription_2016,2016)
prescriber2015byState <- anadata(prescription_2015,2015)
prescriber2014byState <- anadata(prescription_2014,2014)
prescriber2013byState <- anadata(prescription_2013,2013)
```
Here, we write a loop to execute the repetitive procedures.
The alternative to do it is to implement some commands in
the package `purrr`. [ Here is the cheatsheet for
using [purrr](https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf)]
Now, we're ready for our next step - data visualization!
# Data Visualization
## 1. Comparing total opioid prescriptions in 2016
The first task is to understand the opioid prescriptions by States in 2016.
We are going to use the functions in the library `ggplot2` to help us
plot the bar chart.
```{r fig.width=8,fig.height=10}
library(ggplot2)
p <- prescriber2016byState %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims")
```
It's nice to visualize all the information we have from this dataset. However,
based on our original questions, we may want to focus on the 50 States and
DC. Since for all the regions other than the 50 States and DC, they were
labelled as `Others` in their `region` variable, we can limit the analysis
based on them using the **filter** .
```{r fig.width=8,fig.height=10}
p <- prescriber2016byState %>%
filter(region!="Others") %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims")
```
You may wonder why the labels on the y-axis are **6e+06** , **4e+06** ,
..., which are relatively hard to read. Here comes one potential solution -
using the **scale_y_continuous(labels = comma)** in the library `scales`.
This function will force the display of labels in y-axis be an easy-to-read
number.
```{r fig.width=8,fig.height=10}
library(scales)
p + scale_y_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 90, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims")
```
You may feel it's hard to read since you need to either
tilt your head or have a special ability to read the labels of States in the
x-axis. Here, we provide one way to rotate this bar chart
using **coord_flip()** .
```{r fig.width=10,fig.height=8}
p + scale_y_continuous(labels = comma) +
coord_flip() +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Number of opioid claims")
```
### (1) What are the top five States with the highest and lowest opioid prescription claims in Part D in 2016?
Based on the plot above, California, Florida, Texas, Pennsylvania,
and Michigan are the top five States with the highest opioid prescription claims
in Part D in 2016. And the top five States with the lowest
opioid prescription claims in Part D in 2016 are District of Columbia,
Alaska, Wyoming, Vermont, and Hawaii.
### (2) Can you visualize the top five States with the highest and lowest opioid prescription in Part D in 2016 among the fifty States and DC (with only these States shown)?
```{r fig.width=8,fig.height=5}
p <- prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5,
n()-4,
n()-3,
n()-2,
n()-1,
n() # n() means the number of rows
# Thus, n() is the one with highet total_opioid_claim
)) %>%
mutate(rankgroup = ifelse(row_number()<=5,"Lowest five",
"Highest five")) %>%
ggplot(aes(x=reorder(state_name,
-total_opioid_claim),
y=total_opioid_claim)) +
geom_bar(stat="identity")
p + scale_y_continuous(labels = comma) +
coord_flip() +
facet_wrap(~rankgroup, scales="free") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total opioid claims (in Medicare Part D) by State in 2016",
subtitle = "Top five States with the highest and lowest opioid prescription") +
xlab("States") +
ylab("Number of opioid claims")
```
### (3) Are you satisfied with the plots?
The plots did show the numbers of total opioid claims right.
However, the audience need to read the x-axis carefully.
Otherwise, they may misinterpret the results since the scales
are so different. In this case, it may be a good idea to show
a table with the numbers.
```{r fig.width=8,fig.height=5}
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5,
n()-4,
n()-3,
n()-2,
n()-1,
n())) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()
```
Alternatively, you can show the top 5 by
```{r fig.width=8,fig.height=5}
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(1:5)) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()
```
And you can show the lower 5 by
```{r fig.width=8,fig.height=5}
prescriber2016byState %>%
filter(region != "Others") %>%
arrange(.,-total_opioid_claim) %>%
filter(row_number() %in% c(n()-4,
n()-3,
n()-2,
n()-1,
n())) %>%
select(state_name,total_opioid_claim) %>%
kable() %>%
kable_styling()
```
### (4) How do you feel about the results? Do the results surprise you?
Probably you're not surprised by the results, since
the population in California, Florida, and Texas is large, and
the population in District of Columbia, Alaska, and Wyoming is small.
To make a fair comparison across
different States, we may want to adjust for "size of States." There are many
good options to achieve this goal. In this case study, we provide two potential
methods:
(1) Create a variable called **opioid prescription rate** (defined by CMS). **Opioid prescription rate** is the proportion of opioid claims among the total claims in
a certain year. Since total claims can be a proxy of the number of enrollments in
Part D, **opioid prescription rate** can be considered as a fair comparison across States.
(2) Create a variable called **opioid prescription claims per 100 enrollments** .
Since the enrollment number by States was available in 2015, we can create this
variable to directly adjust for the population size. (Please let us know if
there's any publicly available information about Part D enrollment by States in the other years.)
To visualize our hypothesis, we will have two plots:
(1) The plot showing the number of enrollments versus number of opioid claims
```{r fig.width=10,fig.height=8}
library(ggrepel)
p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x = enrollment_num2015,
y = total_opioid_claim,
color = region)) +
geom_point() +
geom_smooth(method = "lm", col = "blue")
p + scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Number of enrollments vs. Opioid prescription claims in Medicare Part D\ by State in 2015",
subtitle = "Number of enrollments is associated with number of opioid prescriptions") +
xlab(" Number of enrollments ") +
ylab(" Number of opioid claims ") +
geom_text_repel(aes(label=`nppes provider state`))
```
(2) The plot showing the number of total claims versus number of opioid claims
```{r fig.width=10,fig.height=8}
p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x = total_claim_count,
y = total_opioid_claim,
color = region)) +
geom_point() +
geom_smooth(method = "lm", col = "blue")
p + scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Total claims vs. Opioid prescription claims in Medicare Part D\ by State in 2015",
subtitle = "Number of total claims is associated with number of opioid prescriptions") +
xlab(" Number of total claims ") +
ylab(" Number of opioid claims ") +
geom_text_repel(aes(label=`nppes provider state`))
```
## 2. Comparing opioid prescription rate by State in 2016
```{r fig.width=8,fig.height=8}
p <- prescriber2016byState %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate)) +
geom_bar(stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid Prescription Rate (in Medicare Part D) by State in 2016") +
xlab("States") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)")
```
From this plot, we can see that in Part D in 2016,
the opioid prescription rates in Nevada, Utah, Alabama, Oklahoma, Idaho are the highest,
and the opioid prescription rates in New York, Rhode Island, Hawaii,
Massachusetts, and North Dakota are the lowest.
## 3. Trend of opioid prescription rate in 2013 - 2016
We may want to see the trend of opioid prescription rate during 2013 and 2016 by State.
### Visualize the change by States using bar chart
Here, we append the data from different years in **long format** first,
and then plot the trend of opioid prescription rate.
```{r fig.width=8,fig.height=8}
p <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate,
fill = as.factor(year))) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("States") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(fill = "Year")
```
### Visualize/Animate the change by States using bar chart
The alternative approach is to animate ther results using `gganimate`.
```{r eval=TRUE}
# <Ref 1> https://gganimate.com/articles/gganimate.html
# <Ref 2> https://towardsdatascience.com/create-animated-bar-charts-using-r-31d09e5841da
library(gganimate)
library(gifski)
data4animation <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
group_by(year) %>%
mutate(rank = rank(-state_opioid_prescribing_rate),
value_rel = state_opioid_prescribing_rate/state_opioid_prescribing_rate[rank==1],
value_std = round(state_opioid_prescribing_rate,2)) %>%
group_by(state_name) %>%
filter(rank <= 20) %>% # choose the top 20
ungroup()
staticplot = ggplot(data4animation,
aes(rank,
group = state_name,
fill = as.factor(state_name),
color = as.factor(state_name))) +
geom_tile(aes(y = round(state_opioid_prescribing_rate/2,2),
height = round(state_opioid_prescribing_rate,2),
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(state_name, " ")), vjust = 0.2, hjust = 1) +
geom_text(aes(y= value_std,
label = value_std, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x = element_line( size=.1, color="grey" ),
panel.grid.minor.x = element_line( size=.1, color="grey" ),
plot.title=element_text(size=10, hjust=0.5, face="bold", colour="grey", vjust=-1),
plot.subtitle=element_text(size=6, hjust=0.5, face="italic", color="grey"),
plot.caption =element_text(size=6, hjust=0.5, face="italic", color="grey"),
plot.background=element_blank() ,
plot.margin = margin(2,2, 1, 3, "cm"))
staticplot
animation::ani.options(ani.width= 1000, ani.height=1000, ani.res = 1000)
anim = staticplot + transition_states(year,
transition_length = 3,
state_length = 2) +
view_follow(fixed_x = TRUE) +
labs(title = "Trend of opioid Prescription Rate (in Medicare Part D) by State {closest_state}",
subtitle = "Opioid Prescription Rate (Number of opioid claims per 100 claims)",
caption = "Data Source: CMS Data"
)
anim
# This is the line to save the animation into gif format
# animate(anim, 200, fps = 20, width = 1200, height = 1000,
# renderer = gifski_renderer("./img/gganim.gif"))
```
In general, the trend the opioid prescription rate was decreasing during 2013 - 2016 across most of the States.
### Let us visualize the variation during the period of 2013 - 2016 across different States
* Here, we define the "variation during the period of 2013 - 2016" as
the ratio between "the difference between maximum opioid prescription and
minimum opioid prescription during 2013 - 2016" and "the maximum opioid
prescription during 2013 - 2016" .
```{r fig.width=12,fig.height=8}
fouryeardata <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
group_by(state_name) %>%
summarise(max_opioid_claim = max(total_opioid_claim),
min_opioid_claim = min(total_opioid_claim)) %>%
mutate(variation = (max_opioid_claim - min_opioid_claim) / max_opioid_claim )
p <- fouryeardata %>%
ggplot(aes(x=reorder(state_name,-variation),
y=variation)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Variation of total opioid claims (in Medicare Part D) by State during 2013 - 2016",
subtitle = "Variation = (maximun opioid claims - minimum opioid claims) / maximum opioid claims") +
xlab("States") +
ylab("Variation of total opioid claims")
```
### Alternative way to display the longitudinal trends
There are many ways to display the longitudinal trends of opioid prescription
rate. In the plot shown above, the longitudinal change was shown clearly by States.
However, it has some disadvantages:
(1) It is difficult for us to compare the trend from State to State directly.
(2) It is hard for us to explore the potential reasons to explain the difference
between different States.
Now, suppose we want to visually explore whether the change in opioid prescription rate
over time was associate with region, we may want to have a plot showing
the changes over time by States, which also incorporated the region information.
Based on what we have established, we need to organize our data in `long format`
and add the `group` information in our `ggplot`.
```{r warning=TRUE, message=TRUE, fig.width=10,fig.height=8}
datamodel <- prescriber2016byState %>%
union(.,prescriber2015byState) %>%
union(.,prescriber2014byState) %>%
union(.,prescriber2013byState) %>%
filter(region != "Others") %>%
mutate(interval = year - 2013)
p <- datamodel %>%
ggplot(aes(x = year,
y = state_opioid_prescribing_rate,
group = state_name,
color = region)) +
geom_line()
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("Calender Year") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(color = "Region")
```
Also, you may want to add a "summarized" line for each region.
In other words, the question is whether we can visually display
the summarized longitudinal trend in each region. Here, we provide
a way that can achieve this goal in a few seconds. The idea is
to take the average of opioid prescription rate in each region at
year 2013, 2014, 2015, and 2016, and further then plot the trend
of the region-specific average opioid prescription rate over time.
```{r fig.width=10,fig.height=8}
p <- datamodel %>%
ggplot(aes(x = year,
y = state_opioid_prescribing_rate,
group = state_name,
color = region)) +
stat_summary(aes(group = region),
geom = "line",
fun.y = mean, size = 5) +
geom_line()
p + theme_bw() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 0.5)) +
ggtitle("Trend of opioid Prescription Rate (in Medicare Part D) by State during 2013 - 2016") +
xlab("Calender Year") +
ylab("Opioid Prescription Rate (Number of opioid claims per 100 claims)") +
labs(color = "Region")
```
## 4. Comparing opioid prescription by States in 2015
Since there are many policies made at State level, and
it's relatively difficult to get information about the
policy in militaries, foreign countries, and unknown areas,
we may want to focus on the areas we are able to get the
information about the number of Part D Part enrollment.
### Method I: using "opioid prescription rate"
```{r fig.width=12,fig.height=8}
p <- prescriber2015byState %>%
filter(region != "Others") %>% # This line removes the region/States without population information.
ggplot(aes(x=reorder(state_name,
-state_opioid_prescribing_rate),
y=state_opioid_prescribing_rate)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
plot.subtitle = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid prescrion rate (in Medicare Part D) by State in 2015",
subtitle = "Opioid prescription rate = ( # of opioid claims / # of total claims ) * 100 %") +
xlab("States") +
ylab("Opioid prescription rate (%)")
```
### Method II: using "opioid prescription per 100 enrollments"
We may want to see opioid prescription per 100 enrollment in 2015.
```{r fig.width=12,fig.height=8}
p <- prescriber2015byState %>%
filter(region != "Others") %>%
ggplot(aes(x=reorder(state_name,
-opioid_claim_bypop),
y=opioid_claim_bypop)) +
geom_bar(position = "dodge", stat="identity")
p + theme_bw() +
coord_flip() +
theme(plot.title = element_text(hjust = 0.5 ),
axis.text.x = element_text(angle = 0, hjust = 1,vjust = 0.5)) +
ggtitle("Opioid prescription per 100 enrollments in Medicare Part D\n by State in 2015") +
xlab("States") +
ylab("Number of opioid prescription claims per 100 enrollments")
```
### Do two methods give us same information?
The opioid prescription rate was high in American Samoa, but
the opioid prescription per 100 enrollments is low in American Samoa.
This may be due to there are relatively few total claims in American Samoa,
but the proportion of opioid prescription is high. The opioid prescription rates in
Utah, Nevada, Oklahoma, Alabama, Idaho, Colorado, Tennessee, and Oregon
are also relatively high. The opioid prescription per 100 enrollments are
relatively high in Tennessee, Alabama, and Oklahoma.
### Can you visualize the information from these two methods?
To visualize the information from these two methods, we
propose to compare the "rankings" from two methods.
First, we will create one variable called `rank_method1`, which is
the rank based on the method I (state opioid prescription rate),
and the other variable called `rank_method2` , which is the rank
based on the method II (opioid claim by number of enrollment).
This can be done by using the `dense_rank()`.
Then, we can have an x-y plot showing the ranking between these
two methods.
```{r fig.width=12,fig.height=10}
## create a new dataframe called "method_comparison"