This function performs imputations on incomplete covariates, whatever their types, using functions from the package MICE (Van Buuren's Multiple Imputation) or functions from the package missMDA (Simple Imputation with Multivariate data analysis).
a data.frame containing the variables to be imputed and those involved in the imputations
a vector of integers. The corresponding column indexes (or numbers) corresponding to the variables to be imputed and those involved in the imputations.
an integer. The number of imputed database generated with MICE method (5 by default).
a vector of characters which specifies the imputation method to be used for each column in
"pmm" for continuous covariates or by default option, "logreg" for binary covariates, "polr" for ordinal covariates, "polyreg" for categorical covariates (no order), (cf
mice for more details).
a boolean. If
TRUE, missing values are imputed using the factoral analysis for mixed data (
imputeFAMD) from the missMDA package (2).
an integer corresponding to the number of components used in FAMD to predict the missing entries (3 by default) when the
missMDA option is TRUE.
an integer used as argument by the set.seed() for offsetting the random number generator (Random integer by default)
A list of 3 or 4 objects (depending on the missMDA argument). The first three following objects if
missMDA = TRUE, otherwise 4 objects are returned:
a data.frame corresponding to the raw database
a character indicating the type of selected imputation
a data.frame corresponding to the completed (consensus if multiple imputations) database
only if missMDA = FALSE. A list object containing the R imputed databases generated by MICE
By default, the function
impute_cov handles missing information using multivariate imputation by chained equations (MICE, see (1) for more details about the method) by integrating in its syntax the function
All values of this last function are taken by default, excepted the required number of multiple imputations, which can be fixed by using the argument
R_mice, and the chosen imputation method for each variable (
that corresponds to the argument
defaultMethod of the function
When multiple imputations are required (for MICE only), each missing information is imputed by a consensus value:
the average of the candidate values will be retained for numerical variables, while the most frequent class will be remained for categorical variables (ordinal or not).
MICE_IMPS stores the imputed databases to allow users to build their own consensus values by themselves and(or) to eventually assess the variabilities related to the proposed imputed values if necessary.
For this method, a random number generator must be fixed or sampled using the argument
When the argument
missMDA is equalled to
TRUE, incomplete values are replaced (single imputation) using a method based on dimensionality reduction called factor analysis for mixed data (FAMD) using the the
imputeFAMD function of the missMDA package (2).
Using this approach, the function
imput_cov keeps all the default values integrated in the function
imputeFAMD excepted the number of dimensions used for FAMD which can be fixed by users (3 by default).
van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. urlhttps://www.jstatsoft.org/v45/i03/
Josse J, Husson F (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1–31. doi:10.18637/jss.v070.i01
# Imputation of all incomplete covariates in the table simu_data: data(simu_data) # Here we keep the complete variable "Gender" in the imputation model. # Using MICE (REP = 3): imput_mice <- imput_cov(simu_data, indcol = 4:8, R_mice = 3, meth = c("logreg", "polyreg", "polr", "logreg", "pmm") ) summary(imput_mice) #> Length Class Mode #> RAW 8 data.frame list #> IMPUTE 1 -none- character #> DATA_IMPUTE 5 data.frame list #> MICE_IMPS 3 mild list # Using FAMD (NB_COMP = 3): imput_famd <- imput_cov(simu_data, indcol = 4:8, meth = c("logreg", "polyreg", "polr", "logreg", "pmm"), missMDA = TRUE ) summary(imput_famd) #> Length Class Mode #> RAW 8 data.frame list #> IMPUTE 1 -none- character #> DATA_IMPUTE 5 data.frame list