Helper functions

OTRecod.disp_inst_info — Method

disp_inst_info(inst)

Display information about the distance between the modalities

source

OTRecod.compute_distrib_error! — Method

compute_distrib_error!(sol, inst, empiricalZA, empiricalYB)

Compute errors in the conditional distributions of a solution

source

OTRecod.compute_distrib_error_3covar — Method

compute_distrib_error_3covar(
    sol,
    inst,
    empiricalZA,
    empiricalYB
)

source

OTRecod.compute_pred_error! — Function

compute_pred_error!(sol, inst)
compute_pred_error!(sol, inst, proba_disp)
compute_pred_error!(sol, inst, proba_disp, mis_disp)
compute_pred_error!(
    sol,
    inst,
    proba_disp,
    mis_disp,
    full_disp
)

Compute prediction errors in a solution

source

OTRecod.aggregate_per_covar_mixed — Function

aggregate_per_covar_mixed(inst)
aggregate_per_covar_mixed(inst, norme)
aggregate_per_covar_mixed(inst, norme, aggregate_tol)

source

OTRecod.empirical_distribution — Function

empirical_distribution(inst)
empirical_distribution(inst, norme)
empirical_distribution(inst, norme, aggregate_tol)

Return the empirical cardinality of the joint occurrences of (C=x,Y=mA,Z=mB) in both bases

source

OTRecod.average_distance_to_closest — Method

average_distance_to_closest(inst, percent_closest)

Compute the cost between pairs of outcomes as the average distance between covariations of individuals with these outcomes, but considering only the percent closest neighbors

source

OTRecod.avg_distance_closest — Method

avg_distance_closest(
    inst,
    base1,
    base2,
    outcome,
    m1,
    m2,
    percent_closest
)

Compute the average distance between individuals of base1 with modality m1 for outcome and individuals of base2 with modality m2 for outcome

Consider only the percent_closest individuals in the computation of the distance

source

OTRecod.empirical_estimator — Function

empirical_estimator(path)
empirical_estimator(path, observed)

Get an empirical estimator of the distribution of Z conditional to Y and X on base A and reciprocally on base B obtain with a specific type of data sets

path: path of the directory containing the data set
observed: if nonempty, list of indices of the observed covariates; this allows to exclude some latent variables.

source

OTRecod.simulate — Function

simulate()
simulate(R2)
simulate(R2, muA)
simulate(R2, muA, muB)
simulate(R2, muA, muB, alphaA)
simulate(R2, muA, muB, alphaA, alphaB)
simulate(R2, muA, muB, alphaA, alphaB, n)
simulate(R2, muA, muB, alphaA, alphaB, n, q1)
simulate(R2, muA, muB, alphaA, alphaB, n, q1, q2)
simulate(R2, muA, muB, alphaA, alphaB, n, q1, q2, q3)

Simulate one dataset with three covariates described by their mean in each database (muA and muB) and the quantiles used for discretization (q1,q2,q3) The dependency of outcomes on covariates is linear and given by the weights alpha1, alpha2 and by the R2 coefficient The instance contains n individuals in each base

source

OTRecod.bound_prediction_error — Function

bound_prediction_error(inst)
bound_prediction_error(inst, norme)
bound_prediction_error(inst, norme, aggregate_tol)

Compute a bound on the average prediction error in each base. The bound is computed as the expected prediction error assuming that the distribution of Z in base A (and that of Y in base B) is known, and the prediction done with the value that maximizes the probability

source

OTRecod.compute_average_error_bound — Function

compute_average_error_bound(path)
compute_average_error_bound(path, norme)

Compute a lower bound on the best average prediction error that one can obtain with a specific type of data sets path: path of the directory containing the data set

source