statsmodels.stats.weightstats.DescrStatsW¶

class statsmodels.stats.weightstats.DescrStatsW(data, weights=None, ddof=0)[source]¶

Descriptive statistics and tests with weights for case weights

Assumes that the data is 1d or 2d with (nobs, nvars) observations in rows, variables in columns, and that the same weight applies to each column.

If degrees of freedom correction is used, then weights should add up to the number of observations. ttest also assumes that the sum of weights corresponds to the sample size.

This is essentially the same as replicating each observations by its weight, if the weights are integers, often called case or frequency weights.

Parameters:¶

dataarray_like, 1-D or 2-D: dataset
weightsNone or 1-D ndarray: weights for each observation, with same length as zero axis of data
ddofint: default ddof=0, degrees of freedom correction used for second moments, var, std, cov, corrcoef. However, statistical tests are independent of ddof, based on the standard formulas.

Attributes:¶

corrcoef

weighted correlation with default ddof

assumes variables in columns and observations in rows

cov

weighted covariance of data if data is 2 dimensional

assumes variables in columns and observations in rows uses default ddof

demeaned

data with weighted mean subtracted

mean

weighted mean of data

nobs

alias for number of observations/cases, equal to sum of weights

std

standard deviation with default degrees of freedom correction

std_mean

standard deviation of weighted mean

sum

weighted sum of data

sum_weights

Sum of weights

sumsquares

weighted sum of squares of demeaned data

var

variance with default degrees of freedom correction

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> x1_2d = 1.0 + np.random.randn(20, 3)
>>> w1 = np.random.randint(1, 4, 20)
>>> d1 = DescrStatsW(x1_2d, weights=w1)
>>> d1.mean
array([ 1.42739844,  1.23174284,  1.083753  ])
>>> d1.var
array([ 0.94855633,  0.52074626,  1.12309325])
>>> d1.std_mean
array([ 0.14682676,  0.10878944,  0.15976497])

>>> tstat, pval, df = d1.ttest_mean(0)
>>> tstat; pval; df
array([  9.72165021,  11.32226471,   6.78342055])
array([  1.58414212e-12,   1.26536887e-14,   2.37623126e-08])
44.0

>>> tstat, pval, df = d1.ttest_mean([0, 1, 1])
>>> tstat; pval; df
array([ 9.72165021,  2.13019609,  0.52422632])
array([  1.58414212e-12,   3.87842808e-02,   6.02752170e-01])
44.0

# if weights are integers, then asrepeats can be used

>>> x1r = d1.asrepeats()
>>> x1r.shape
...
>>> stats.ttest_1samp(x1r, [0, 1, 1])
...

Methods

`asrepeats`()	get array that has repeats given by floor(weights)
`get_compare`(other[, weights])	return an instance of CompareMeans with self and other
`quantile`(probs[, return_pandas])	Compute quantiles for a weighted sample.
`std_ddof`([ddof])	standard deviation of data with given ddof
`tconfint_mean`([alpha, alternative])	two-sided confidence interval for weighted mean of data
`ttest_mean`([value, alternative])	ttest of Null hypothesis that mean is equal to value.
`ttost_mean`(low, upp)	test of (non-)equivalence of one sample
`var_ddof`([ddof])	variance of data given ddof
`zconfint_mean`([alpha, alternative])	two-sided confidence interval for weighted mean of data
`ztest_mean`([value, alternative])	z-test of Null hypothesis that mean is equal to value.
`ztost_mean`(low, upp)	test of (non-)equivalence of one sample, based on z-test

Properties

`corrcoef`	weighted correlation with default ddof
`cov`	weighted covariance of data if data is 2 dimensional
`demeaned`	data with weighted mean subtracted
`mean`	weighted mean of data
`nobs`	alias for number of observations/cases, equal to sum of weights
`std`	standard deviation with default degrees of freedom correction
`std_mean`	standard deviation of weighted mean
`sum`	weighted sum of data
`sum_weights`	Sum of weights
`sumsquares`	weighted sum of squares of demeaned data
`var`	variance with default degrees of freedom correction