Multiple Imputation with Chained Equations

The MICE module allows most statsmodels models to be fit to a dataset with missing values on the independent and/or dependent variables, and provides rigorous standard errors for the fitted parameters. The basic idea is to treat each variable with missing values as the dependent variable in a regression, with some or all of the remaining variables as its predictors. The MICE procedure cycles through these models, fitting each in turn, then uses a procedure called “predictive mean matching” (PMM) to generate random draws from the predictive distributions determined by the fitted models. These random draws become the imputed values for one imputed data set.

By default, each variable with missing variables is modeled using a linear regression with main effects for all other variables in the data set. Note that even when the imputation model is linear, the PMM procedure preserves the domain of each variable. Thus, for example, if all observed values for a given variable are positive, all imputed values for the variable will always be positive. The user also has the option to specify which model is used to produce imputed values for each variable.

Classes

MICE(model_formula, model_class, data[, ...])

Multiple Imputation with Chained Equations.

MICEData(data[, perturbation_method, k_pmm, ...])

Wrap a data set to allow missing data handling with MICE.

MI(imp, model[, model_args_fn, ...])

MI performs multiple imputation using a provided imputer object.

BayesGaussMI(data[, mean_prior, cov_prior, ...])

Bayesian Imputation using a Gaussian model.

Implementation Details

Internally, this function uses pandas.isnull. Anything that returns True from this function will be treated as missing data.


Last update: Jan 20, 2025