statsmodels.nonparametric.kernel_density.KDEMultivariate

class statsmodels.nonparametric.kernel_density.KDEMultivariate(data, var_type, bw=None, defaults=None)[source]

Multivariate kernel density estimator.

This density estimator can handle univariate as well as multivariate data, including mixed continuous / ordered discrete / unordered discrete data. It also provides cross-validated bandwidth selection methods (least squares, maximum likelihood).

Parameters:
datalist of ndarrays or 2-D ndarray

The training data for the Kernel Density Estimation, used to determine the bandwidth(s). If a 2-D array, should be of shape (num_observations, num_variables). If a list, each list element is a separate observation.

var_typestr

The type of the variables:

  • c : continuous

  • u : unordered (discrete)

  • o : ordered (discrete)

The string should contain a type specifier for each variable, so for example var_type='ccuo'.

bwarray_like or str, optional

If an array, it is a fixed user-specified bandwidth. If a string, should be one of:

  • normal_reference: normal reference rule of thumb (default)

  • cv_ml: cross validation maximum likelihood

  • cv_ls: cross validation least squares

defaultsEstimatorSettings instance, optional

The default values for (efficient) bandwidth estimation.

Examples

>>> import statsmodels.api as sm
>>> nobs = 300
>>> np.random.seed(1234)  # Seed random generator
>>> c1 = np.random.normal(size=(nobs,1))
>>> c2 = np.random.normal(2, 1, size=(nobs,1))

Estimate a bivariate distribution and display the bandwidth found:

>>> dens_u = sm.nonparametric.KDEMultivariate(data=[c1,c2],
...     var_type='cc', bw='normal_reference')
>>> dens_u.bw
array([ 0.39967419,  0.38423292])
Attributes:
bwarray_like

The bandwidth parameters.

Methods

cdf([data_predict])

Evaluate the cumulative distribution function.

imse(bw)

Returns the Integrated Mean Square Error for the unconditional KDE.

loo_likelihood(bw[, func])

Returns the leave-one-out likelihood function.

pdf([data_predict])

Evaluate the probability density function.