statsmodels.stats.dist_dependence_measures.distance_covariance_test

statsmodels.stats.dist_dependence_measures.distance_covariance_test(x, y, B=None, method='auto')[source]

The Distance Covariance (dCov) test

Apply the Distance Covariance (dCov) test of independence to x and y. This test was introduced in [1], and is based on the distance covariance statistic. The test is applicable to random vectors of arbitrary length (see the notes section for more details).

Parameters:
xarray_like, 1-D or 2-D

If x is 1-D than it is assumed to be a vector of observations of a single random variable. If x is 2-D than the rows should be observations and the columns are treated as the components of a random vector, i.e., each column represents a different component of the random vector x.

yarray_like, 1-D or 2-D

Same as x, but only the number of observation has to match that of x. If y is 2-D note that the number of columns of y (i.e., the number of components in the random vector) does not need to match the number of columns in x.

Bint, optional, default=`None`

The number of iterations to perform when evaluating the null distribution of the test statistic when the emp method is applied (see below). if B is None than as in [1] we set B to be B = 200 + 5000/n, where n is the number of observations.

method{‘auto’, ‘emp’, ‘asym’}, optional, default=auto

The method by which to obtain the p-value for the test.

  • auto : Default method. The number of observations will be used to determine the method.

  • emp : Empirical evaluation of the p-value using permutations of the rows of y to obtain the null distribution.

  • asym : An asymptotic approximation of the distribution of the test statistic is used to find the p-value.

Returns:
test_statisticfloat

The value of the test statistic used in the test.

pvalfloat

The p-value.

chosen_methodstr

The method that was used to obtain the p-value. Mostly relevant when the function is called with method=’auto’.

Notes

The test applies to random vectors of arbitrary dimensions, i.e., x can be a 1-D vector of observations for a single random variable while y can be a k by n 2-D array (where k > 1). In other words, it is also possible for x and y to both be 2-D arrays and have the same number of rows (observations) while differing in the number of columns.

As noted in [1] the statistics are sensitive to all types of departures from independence, including nonlinear or nonmonotone dependence structure.

References

[1] (1,2,3)

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007) “Measuring and testing by correlation of distances”. Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.

Examples

>>> from statsmodels.stats.dist_dependence_measures import
... distance_covariance_test
>>> data = np.random.rand(1000, 10)
>>> x, y = data[:, :3], data[:, 3:]
>>> x.shape
(1000, 3)
>>> y.shape
(1000, 7)
>>> distance_covariance_test(x, y)
(1.0426404792714983, 0.2971148340813543, 'asym')
# (test_statistic, pval, chosen_method)