simu_data.Rd
The first 300 rows belong to the database A, while the next 400 rows belong to the database B.
Five covariates: Gender
, Treatment
, Dosage
, Smoking
and Age
are
common to both databases (same encodings). Gender
is the only complete covariate.
The variables Yb1
and Yb2
are the target variables of A and B respectively, summarizing a same information encoded in two different scales.
that summarize a same information saved in two distinct encodings, that is why, Yb1
is
missing in the database B and Yb2
is missing in the database A.
simu_data
A data.frame made of 2 overlayed databases (A and B) with 700 observations on the following 8 variables.
the database identifier, a character with 2 possible classes: A
or B
the target variable of the database A, stored as factor and encoded in 3 ordered levels: [20-40]
, [40-60[
,[60-80]
(the values related to the database B are missing)
the target variable of the database B, stored as integer (an unknown scale from 1 to 5) in the database B (the values related to A are missing)
a factor with 2 levels (Female
or Male
) and no missing values
a covariate of 3 classes stored as a character with 2% of missing values: Placebo
, Trt A
, Trt B
a factor with 4 levels and 5% of missing values: from Dos 1
to dos 4
a covariate of 2 classes stored as a character and 10% of missing values: NO
for non smoker, YES
otherwise
a numeric corresponding to the age of participants in years. This variable counts 5% of missing values
randomly generated
The purpose of the functions contained in this package is to predict the missing information on Yb1
and Yb2
in database A and database B using the Optimal Transportation Theory.
Missing information has been simulated to some covariates following a simple MCAR process.