RDDtools is a new R package under development, designed to offer a set of tools to run all the steps required for a Regression Discontinuity Design (RDD) Analysis, from primary data visualisation to discontinuity estimation, sensitivity and placebo testing.
This github website hosts the source code. One of the easiest ways to install the package from github is by using the R package devtools:
library(devtools)
install_github(repo = "RDDtools", username = "MatthieuStigler", subdir = "RDDtools")
Note however the latest version of RDDtools only works with R 3.0, and that you might need to install Rtools if on Windows.
The (preliminary) documentation is available in the help files directly, as well as in the vignette. The vignette can be accessed from R with vignette("RDDtools"), or by accessing the pdf stored on this github.
-
Simple visualisation of the data using binned-plot: plot()
-
Bandwidth selection:
- MSE-RDD bandwidth procedure of Imbens and Kalyanaraman 2012: RDDbw_IK()
- MSE global bandwidth procedure of Ruppert et al 1995: RDDbw_RSW()
-
Estimation:
- RDD parametric estimation: RDDreg_lm() This includes specifying the polynomial order, including covariates with various specifications as advocated in Imbens and Lemieux 2008.
- RDD local non-parametric estimation: RDDreg_np(). Can also include covariates, and allows different types of inference (fully non-parametric, or parametric approximation).
- RDD generalised estimation: allows to use custom estimating functions to get the RDD coefficient. Could allow for example a probit RDD, or quantile regression.
-
Post-Estimation tools:
- Various tools, to obtain predictions at given covariate values ( RDDpred() ), or to convert to other classes, to lm ( as.lm() ), or to the package np ( as.npreg() ).
- Function to do inference with clustered data: clusterInf() either using a cluster covariance matrix ( vcovCluster() ) or by a degrees of freedom correction (as in Cameron et al. 2008).
-
Regression sensitivity analysis:
- Plot the sensitivity of the coefficient with respect to the bandwith: plotSensi()
- Placebo plot using different cutpoints: plotPlacebo()
-
Design sensitivity analysis:
- McCrary test of manipulation of the forcing variable: wrapper dens_test() to the function DCdensity() from package rdd.
- Test of equal means of covariates: covarTest_mean()
- Test of equal density of covariates: covarTest_dens()
-
Datasets
- Contains the seminal dataset of Lee 2008: Lee2008
- Contains functions to replicate the Monte-Carlo simulations of Imbens and Kalyanaraman 2012: gen_MC_IK()
RDDtools works in an object-oriented way: the user has to define once the characteristic of the data, creating a RDDdata object, on which different anaylsis tools can be applied.
Load the package, and load the built-in dataset from Lee 2008:
library(RDDtools)
data(Lee2008)
Declare the data to be a RDDdata object:
Lee2008_rdd <- RDDdata(y = Lee2008$y, x = Lee2008$x, cutpoint = 0)
You can now directly summarise and visualise this data:
summary(Lee2008_rdd)
## ### RDDdata object ###
##
## Cutpoint: 0
## Sample size:
## -Full : 6558
## -Left : 2740
## -Right: 3818
## Covariates: no
plot(Lee2008_rdd)
Estimate parametrically, by fitting a 4th order polynomial:
reg_para <- RDDreg_lm(RDDobject = Lee2008_rdd, order = 4)
reg_para
## ### RDD regression: parametric ###
## Polynomial order: 4
## Slopes: separate
## Number of obs: 6558 (left: 2740, right: 3818)
##
## Coefficient:
## Estimate Std. Error t value Pr(>|t|)
## D 0.0766 0.0132 5.79 7.6e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_para)
As well as run a simple local regression, using the Imbens and Kalyanaraman 2012 bandwidth:
bw_ik <- RDDbw_IK(Lee2008_rdd)
reg_nonpara <- RDDreg_np(RDDobject = Lee2008_rdd, bw = bw_ik)
print(reg_nonpara)
## ### RDD regression: nonparametric local linear###
## Bandwidth: 0.2939
## Number of obs: 3200 (left: 1594, right: 1606)
##
## Coefficient:
## Estimate Std. Error z value Pr(>|z|)
## D 0.07992 0.00946 8.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(x = reg_nonpara)
One can easily check the sensitivity of the estimate to different bandwidths:
plotSensi(reg_nonpara, from = 0.05, to = 1, by = 0.1)
Or run the Placebo test, estimating the RDD effect based on fake cutpoints:
plotPlacebo(reg_nonpara)
Design sensitivity tests check whether the discontinuity found can actually be attributed ot other causes. Two types of tests are available:
- Discontinuity comes from manipulation: test whether there is possible manipulation around the cutoff, McCrary 2008 test: dens_test()
- Discontinuity comes from other variables: should test whether discontinuity arises with covariates. Currently, only simple tests of equality of covariates around the threshold are available:
use simply the function dens_test(), on either the raw data, or the regression output:
dens_test(reg_nonpara)
##
## McCrary Test for no discontinuity of density around cutpoint
##
## data: reg_nonpara
## z-val = 1.295, p-value = 0.1952
## alternative hypothesis: Density is discontinuous around cutpoint
## sample estimates:
## Discontinuity
## 0.1035
Two tests available:
- equal means of covariates: covarTest_mean()
- equal density of covariates: covarTest_dens()
We need here to simulate some data, given that the Lee (2008) dataset contains no covariates. We here simulate three variables, with the second having a different mean on the left and the right.
set.seed(123)
n_Lee <- nrow(Lee2008)
Z <- data.frame(z1 = rnorm(n_Lee, sd = 2), z2 = rnorm(n_Lee, mean = ifelse(Lee2008 <
0, 5, 8)), z3 = sample(letters, size = n_Lee, replace = TRUE))
Lee2008_rdd_Z <- RDDdata(y = Lee2008$y, x = Lee2008$x, covar = Z, cutpoint = 0)
Run the tests:
## test for equality of means around cutoff:
covarTest_mean(Lee2008_rdd_Z, bw = 0.3)
## mean of x mean of y Difference statistic p.value
## z1 0.004268 0.02186 0.01759 -0.2539 0.7996
## z2 5.006 7.985 2.979 -84.85 0
## z3 13.19 13.44 0.2465 -0.941 0.3468
## Can also use function covarTest_dis() for Kolmogorov-Smirnov test:
covarTest_dis(Lee2008_rdd_Z, bw = 0.3)
## statistic p.value
## z1 0.03482 0.2727
## z2 0.8648 0
## z3 0.03009 0.4474
Tests correctly reject equality of the second, and correctly do not reject equality for the first and third.