statsmodels.stats.descriptivestats.Description¶
- class statsmodels.stats.descriptivestats.Description(data, stats=None, *, numeric=True, categorical=True, alpha=0.05, use_t=False, percentiles=(1, 5, 10, 25, 50, 75, 90, 95, 99), ntop=5)[source]¶
Extended descriptive statistics for data
- Parameters:
- dataarray_like
Data to describe. Must be convertible to a pandas DataFrame.
- stats
Sequence
[str
],optional
Statistics to include. If not provided the full set of statistics is computed. This list may evolve across versions to reflect best practices. Supported options are: “nobs”, “missing”, “mean”, “std_err”, “ci”, “ci”, “std”, “iqr”, “iqr_normal”, “mad”, “mad_normal”, “coef_var”, “range”, “max”, “min”, “skew”, “kurtosis”, “jarque_bera”, “mode”, “freq”, “median”, “percentiles”, “distinct”, “top”, and “freq”. See Notes for details.
- numericbool,
default
True
Whether to include numeric columns in the descriptive statistics.
- categoricalbool,
default
True
Whether to include categorical columns in the descriptive statistics.
- alpha
float
,default
0.05 A number between 0 and 1 representing the size used to compute the confidence interval, which has coverage 1 - alpha.
- use_tbool,
default
False
Use the Student’s t distribution to construct confidence intervals.
- percentilessequence[
float
] A distinct sequence of floating point values all between 0 and 100. The default percentiles are 1, 5, 10, 25, 50, 75, 90, 95, 99.
- ntop
int
,default
5 The number of top categorical labels to report. Default is
See also
pandas.DataFrame.describe
Basic descriptive statistics
describe
A simplified version that returns a DataFrame
Notes
The selectable statistics include:
“nobs” - Number of observations
“missing” - Number of missing observations
“mean” - Mean
“std_err” - Standard Error of the mean assuming no correlation
“ci” - Confidence interval with coverage (1 - alpha) using the normal or t. This option creates two entries in any tables: lower_ci and upper_ci.
“std” - Standard Deviation
“iqr” - Interquartile range
“iqr_normal” - Interquartile range relative to a Normal
“mad” - Mean absolute deviation
“mad_normal” - Mean absolute deviation relative to a Normal
“coef_var” - Coefficient of variation
“range” - Range between the maximum and the minimum
“max” - The maximum
“min” - The minimum
“skew” - The skewness defined as the standardized 3rd central moment
“kurtosis” - The kurtosis defined as the standardized 4th central moment
“jarque_bera” - The Jarque-Bera test statistic for normality based on the skewness and kurtosis. This option creates two entries, jarque_bera and jarque_beta_pval.
“mode” - The mode of the data. This option creates two entries in all tables, mode and mode_freq which is the empirical frequency of the modal value.
“median” - The median of the data.
“percentiles” - The percentiles. Values included depend on the input value of
percentiles
.“distinct” - The number of distinct categories in a categorical.
“top” - The mode common categories. Labeled top_n for n in 1, 2, …,
ntop
.“freq” - The frequency of the common categories. Labeled freq_n for n in 1, 2, …,
ntop
.
- Attributes:
- numeric_statistics
The list of supported statistics for numeric data
- categorical_statistics
The list of supported statistics for categorical data
- default_statistics
The default list of statistics
Methods
summary
()Summary table of the descriptive statistics
Properties
Descriptive statistics for categorical data
Descriptive statistics for both numeric and categorical data
Descriptive statistics for numeric data