An R package to make imputation simple. Currently supported methods include
- Model based (optionally add [non-]parametric random residual)
- linear regression
- robust linear regression (M-estimation)
- ridge/elasticnet/lasso regression (from version >= 0.2.1)
- CART models
- Random forest
- Model based, multivariate
- Imputation based on EM-estimated parameters (from version >= 0.2.1)
- missForest (from version >= 0.2.1)
- Donor imputation (including various donor pool specifications)
- k-nearest neigbour (based on gower's distance)
- sequential hotdeck (LOCF, NOCB)
- random hotdeck
- Predictive mean matching
- Other
- (groupwise) median imputation (optional random residual)
- Proxy imputation (copy from other variable)
To install simputation and all packages needed to support various imputation models do the following.
install.packages("simputation", dependencies=TRUE)
To install the development version.
git clone https://github.com/markvanderloo/simputation
make install
Create some data suffering from missings
library(simputation) # current package
dat <- iris
# empty a few fields
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA
head(dat,10)
Now impute Sepal.Length
and Sepal.Width
by regression on Petal.Length
and Species
, and impute Species
using a CART model, that uses all other variables (including the imputed variables in this case).
dat |>
impute_lm(Sepal.Length + Sepal.Width ~ Petal.Length + Species) |>
impute_cart(Species ~ .) |> # use all variables except 'Species' as predictor
head(10)