statsmodels.stats.proportion.multinomial_proportions_confint¶
-
statsmodels.stats.proportion.multinomial_proportions_confint(counts, alpha=
0.05
, method='goodman'
)[source]¶ Confidence intervals for multinomial proportions.
- Parameters:¶
- countsarray_like
of
int
, 1-D Number of observations in each category.
- alpha
float
in
(0, 1),optional
Significance level, defaults to 0.05.
- method{‘goodman’, ‘sison-glaz’},
optional
Method to use to compute the confidence intervals; available methods are:
- countsarray_like
- Returns:¶
- confint
ndarray
, 2-D Array of [lower, upper] confidence levels for each category, such that overall coverage is (approximately) 1-alpha.
- confint
- Raises:¶
ValueError
If alpha is not in (0, 1) (bounds excluded), or if the values in counts are not all positive or null.
NotImplementedError
If method is not kown.
Exception
When
method == 'sison-glaz'
, if for some reason c cannot be computed; this signals a bug and should be reported.
Notes
The goodman method [2] is based on approximating a statistic based on the multinomial as a chi-squared random variable. The usual recommendation is that this is valid if all the values in counts are greater than or equal to 5. There is no condition on the number of categories for this method.
The sison-glaz method [3] approximates the multinomial probabilities, and evaluates that with a maximum-likelihood estimator. The first approximation is an Edgeworth expansion that converges when the number of categories goes to infinity, and the maximum-likelihood estimator converges when the number of observations (
sum(counts)
) goes to infinity. In their paper, Sison & Glaz demo their method with at least 7 categories, solen(counts) >= 7
with all values in counts at or above 5 can be used as a rule of thumb for the validity of this method. This method is less conservative than the goodman method (i.e. it will yield confidence intervals closer to the desired significance level), but produces confidence intervals of uniform width over all categories (except when the intervals reach 0 or 1, in which case they are truncated), which makes it most useful when proportions are of similar magnitude.Aside from the original sources ([1], [2], and [3]), the implementation uses the formulas (though not the code) presented in [4] and [5].
References
[1]Levin, Bruce, “A representation for multinomial cumulative distribution functions,” The Annals of Statistics, Vol. 9, No. 5, 1981, pp. 1123-1126.
[2] (1,2,3)Goodman, L.A., “On simultaneous confidence intervals for multinomial proportions,” Technometrics, Vol. 7, No. 2, 1965, pp. 247-254.
[3] (1,2,3)Sison, Cristina P., and Joseph Glaz, “Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions,” Journal of the American Statistical Association, Vol. 90, No. 429, 1995, pp. 366-369.
[4]May, Warren L., and William D. Johnson, “A SAS® macro for constructing simultaneous confidence intervals for multinomial proportions,” Computer methods and programs in Biomedicine, Vol. 53, No. 3, 1997, pp. 153-162.
[5]May, Warren L., and William D. Johnson, “Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells,” Journal of Statistical Software, Vol. 5, No. 6, 2000, pp. 1-24.