Helper functions

OTRecod.compute_pred_error!Function
compute_pred_error!(sol, inst)
compute_pred_error!(sol, inst, proba_disp)
compute_pred_error!(sol, inst, proba_disp, mis_disp)
compute_pred_error!(
    sol,
    inst,
    proba_disp,
    mis_disp,
    full_disp
)

Compute prediction errors in a solution

source
OTRecod.empirical_distributionFunction
empirical_distribution(inst)
empirical_distribution(inst, norme)
empirical_distribution(inst, norme, aggregate_tol)

Return the empirical cardinality of the joint occurrences of (C=x,Y=mA,Z=mB) in both bases

source
OTRecod.average_distance_to_closestMethod
average_distance_to_closest(inst, percent_closest)

Compute the cost between pairs of outcomes as the average distance between covariations of individuals with these outcomes, but considering only the percent closest neighbors

source
OTRecod.avg_distance_closestMethod
avg_distance_closest(
    inst,
    base1,
    base2,
    outcome,
    m1,
    m2,
    percent_closest
)

Compute the average distance between individuals of base1 with modality m1 for outcome and individuals of base2 with modality m2 for outcome

Consider only the percent_closest individuals in the computation of the distance

source
OTRecod.empirical_estimatorFunction
empirical_estimator(path)
empirical_estimator(path, observed)

Get an empirical estimator of the distribution of Z conditional to Y and X on base A and reciprocally on base B obtain with a specific type of data sets

  • path: path of the directory containing the data set
  • observed: if nonempty, list of indices of the observed covariates; this allows to exclude some latent variables.
source
OTRecod.simulateFunction
simulate()
simulate(R2)
simulate(R2, muA)
simulate(R2, muA, muB)
simulate(R2, muA, muB, alphaA)
simulate(R2, muA, muB, alphaA, alphaB)
simulate(R2, muA, muB, alphaA, alphaB, n)
simulate(R2, muA, muB, alphaA, alphaB, n, q1)
simulate(R2, muA, muB, alphaA, alphaB, n, q1, q2)
simulate(R2, muA, muB, alphaA, alphaB, n, q1, q2, q3)

Simulate one dataset with three covariates described by their mean in each database (muA and muB) and the quantiles used for discretization (q1,q2,q3) The dependency of outcomes on covariates is linear and given by the weights alpha1, alpha2 and by the R2 coefficient The instance contains n individuals in each base

source
OTRecod.bound_prediction_errorFunction
bound_prediction_error(inst)
bound_prediction_error(inst, norme)
bound_prediction_error(inst, norme, aggregate_tol)

Compute a bound on the average prediction error in each base. The bound is computed as the expected prediction error assuming that the distribution of Z in base A (and that of Y in base B) is known, and the prediction done with the value that maximizes the probability

source
OTRecod.compute_average_error_boundFunction
compute_average_error_bound(path)
compute_average_error_bound(path, norme)

Compute a lower bound on the best average prediction error that one can obtain with a specific type of data sets path: path of the directory containing the data set

source