This function performs imputations on incomplete covariates, whatever their types, using functions from the package MICE (Van Buuren's Multiple Imputation) or functions from the package missMDA (Simple Imputation with Multivariate data analysis).

imput_cov(
  dat1,
  indcol = 1:ncol(dat1),
  R_mice = 5,
  meth = rep("pmm", ncol(dat1)),
  missMDA = FALSE,
  NB_COMP = 3,
  seed_choice = sample(1:1e+06, 1)
)

Arguments

dat1

a data.frame containing the variables to be imputed and those involved in the imputations

indcol

a vector of integers. The corresponding column indexes (or numbers) corresponding to the variables to be imputed and those involved in the imputations.

R_mice

an integer. The number of imputed database generated with MICE method (5 by default).

meth

a vector of characters which specifies the imputation method to be used for each column in dat1. "pmm" for continuous covariates or by default option, "logreg" for binary covariates, "polr" for ordinal covariates, "polyreg" for categorical covariates (no order), (cf mice for more details).

missMDA

a boolean. If TRUE, missing values are imputed using the factoral analysis for mixed data (imputeFAMD) from the missMDA package (2).

NB_COMP

an integer corresponding to the number of components used in FAMD to predict the missing entries (3 by default) when the missMDA option is TRUE.

seed_choice

an integer used as argument by the set.seed() for offsetting the random number generator (Random integer by default)

Value

A list of 3 or 4 objects (depending on the missMDA argument). The first three following objects if missMDA = TRUE, otherwise 4 objects are returned:

RAW

a data.frame corresponding to the raw database

IMPUTE

a character indicating the type of selected imputation

DATA_IMPUTE

a data.frame corresponding to the completed (consensus if multiple imputations) database

MICE_IMPS

only if missMDA = FALSE. A list object containing the R imputed databases generated by MICE

Details

By default, the function impute_cov handles missing information using multivariate imputation by chained equations (MICE, see (1) for more details about the method) by integrating in its syntax the function mice. All values of this last function are taken by default, excepted the required number of multiple imputations, which can be fixed by using the argument R_mice, and the chosen imputation method for each variable (meth argument), that corresponds to the argument defaultMethod of the function mice. When multiple imputations are required (for MICE only), each missing information is imputed by a consensus value: the average of the candidate values will be retained for numerical variables, while the most frequent class will be remained for categorical variables (ordinal or not). The output MICE_IMPS stores the imputed databases to allow users to build their own consensus values by themselves and(or) to eventually assess the variabilities related to the proposed imputed values if necessary. For this method, a random number generator must be fixed or sampled using the argument seed_choice.

When the argument missMDA is equalled to TRUE, incomplete values are replaced (single imputation) using a method based on dimensionality reduction called factor analysis for mixed data (FAMD) using the the imputeFAMD function of the missMDA package (2). Using this approach, the function imput_cov keeps all the default values integrated in the function imputeFAMD excepted the number of dimensions used for FAMD which can be fixed by users (3 by default).

References

  1. van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. urlhttps://www.jstatsoft.org/v45/i03/

  2. Josse J, Husson F (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1–31. doi:10.18637/jss.v070.i01

Author

Gregory Guernec

otrecod.pkg@gmail.com

Examples

# Imputation of all incomplete covariates in the table simu_data:
data(simu_data)

# Here we keep the complete variable "Gender" in the imputation model.
# Using MICE (REP = 3):
imput_mice <- imput_cov(simu_data,
  indcol = 4:8, R_mice = 3,
  meth = c("logreg", "polyreg", "polr", "logreg", "pmm")
)
summary(imput_mice)
#>             Length Class      Mode     
#> RAW         8      data.frame list     
#> IMPUTE      1      -none-     character
#> DATA_IMPUTE 5      data.frame list     
#> MICE_IMPS   3      mild       list     


# Using FAMD (NB_COMP = 3):
imput_famd <- imput_cov(simu_data,
  indcol = 4:8,
  meth = c("logreg", "polyreg", "polr", "logreg", "pmm"),
  missMDA = TRUE
)
summary(imput_famd)
#>             Length Class      Mode     
#> RAW         8      data.frame list     
#> IMPUTE      1      -none-     character
#> DATA_IMPUTE 5      data.frame list