statsmodels.stats.gof.gof_binning_discrete

statsmodels.stats.gof.gof_binning_discrete(rvs, distfn, arg, nsupp=20)[source]

get bins for chisquare type gof tests for a discrete distribution

Parameters:
rvsndarray

sample data

distnamestr

name of distribution function

argsequence

parameters of distribution

nsuppint

number of bins. The algorithm tries to find bins with equal weights. depending on the distribution, the actual number of bins can be smaller.

Returns:
freqndarray

empirical frequencies for sample; not normalized, adds up to sample size

expfreqndarray

theoretical frequencies according to distribution

histsuppndarray

bin boundaries for histogram, (added 1e-8 for numerical robustness)

Notes

The results can be used for a chisquare test

(chis,pval) = stats.chisquare(freq, expfreq)

originally written for scipy.stats test suite, still needs to be checked for standalone usage, insufficient input checking may not run yet (after copy/paste)

refactor: maybe a class, check returns, or separate binning from

test results

todo :

optimal number of bins ? (check easyfit), recommendation in literature at least 5 expected observations in each bin


Last update: Jan 20, 2025