statsmodels.imputation.bayes_mi.BayesGaussMI¶
- class statsmodels.imputation.bayes_mi.BayesGaussMI(data, mean_prior=None, cov_prior=None, cov_prior_df=1)[source]¶
Bayesian Imputation using a Gaussian model.
The approach is Bayesian. The goal is to sample from the joint distribution of the mean vector, covariance matrix, and missing data values given the observed data values. Conjugate priors for the population mean and covariance matrix are used. Gibbs sampling is used to update the mean vector, covariance matrix, and missing data values in turn. After burn-in, the imputed complete data sets from the Gibbs chain can be used in multiple imputation analyses (MI).
- Parameters:
- data
ndarray
The array of data to be imputed. Values in the array equal to NaN are imputed.
- mean_prior
ndarray
,optional
The covariance matrix of the Gaussian prior distribution for the mean vector. If not provided, the identity matrix is used.
- cov_prior
ndarray
,optional
The center matrix for the inverse Wishart prior distribution for the covariance matrix. If not provided, the identity matrix is used.
- cov_prior_df
positive
float
The degrees of freedom of the inverse Wishart prior distribution for the covariance matrix. Defaults to 1.
- data
Examples
A basic example with OLS. Data is generated assuming 10% is missing at random.
>>> import numpy as np >>> x = np.random.standard_normal((1000, 2)) >>> x.flat[np.random.sample(2000) < 0.1] = np.nan
The imputer is used with
MI
.>>> import statsmodels.api as sm >>> def model_args_fn(x): ... # Return endog, exog from x ... return x[:, 0], x[:, 1:] >>> imp = sm.BayesGaussMI(x) >>> mi = sm.MI(imp, sm.OLS, model_args_fn)
Methods
update
()Cycle through all Gibbs updates.
Gibbs update of the covariance matrix.
Gibbs update of the missing data values.
Gibbs update of the mean vector.