statsmodels.stats.dist_dependence_measures.distance_covariance_test¶
-
statsmodels.stats.dist_dependence_measures.
distance_covariance_test
(x, y, B=None, method='auto')[source]¶ The Distance Covariance (dCov) test
Apply the Distance Covariance (dCov) test of independence to x and y. This test was introduced in [1], and is based on the distance covariance statistic. The test is applicable to random vectors of arbitrary length (see the notes section for more details).
- Parameters
- xarray_like, 1-D or 2-D
If x is 1-D than it is assumed to be a vector of observations of a single random variable. If x is 2-D than the rows should be observations and the columns are treated as the components of a random vector, i.e., each column represents a different component of the random vector x.
- yarray_like, 1-D or 2-D
Same as x, but only the number of observation has to match that of x. If y is 2-D note that the number of columns of y (i.e., the number of components in the random vector) does not need to match the number of columns in x.
- B
int
,optional
, default=`None` The number of iterations to perform when evaluating the null distribution of the test statistic when the emp method is applied (see below). if B is None than as in [1] we set B to be
B = 200 + 5000/n
, where n is the number of observations.- method{‘auto’, ‘emp’, ‘asym’},
optional
, default=auto The method by which to obtain the p-value for the test.
auto : Default method. The number of observations will be used to determine the method.
emp : Empirical evaluation of the p-value using permutations of the rows of y to obtain the null distribution.
asym : An asymptotic approximation of the distribution of the test statistic is used to find the p-value.
- Returns
Notes
The test applies to random vectors of arbitrary dimensions, i.e., x can be a 1-D vector of observations for a single random variable while y can be a k by n 2-D array (where k > 1). In other words, it is also possible for x and y to both be 2-D arrays and have the same number of rows (observations) while differing in the number of columns.
As noted in [1] the statistics are sensitive to all types of departures from independence, including nonlinear or nonmonotone dependence structure.
References
- 1(1,2,3)
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007) “Measuring and testing by correlation of distances”. Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
Examples
>>> from statsmodels.stats.dist_dependence_measures import ... distance_covariance_test >>> data = np.random.rand(1000, 10) >>> x, y = data[:, :3], data[:, 3:] >>> x.shape (1000, 3) >>> y.shape (1000, 7) >>> distance_covariance_test(x, y) (1.0426404792714983, 0.2971148340813543, 'asym') # (test_statistic, pval, chosen_method)