statsmodels.stats.descriptivestats.Description¶

class statsmodels.stats.descriptivestats.Description(data: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame], stats: Optional[Sequence[str]] = None, *, numeric: bool = True, categorical: bool = True, alpha: float = 0.05, use_t: bool = False, percentiles: Sequence[Union[int, float]] = (1, 5, 10, 25, 50, 75, 90, 95, 99), ntop: bool = 5)[source]¶

Extended descriptive statistics for data

Parameters

dataarray_like: Data to describe. Must be convertible to a pandas DataFrame.
statsSequence[str], optional: Statistics to include. If not provided the full set of statistics is computed. This list may evolve across versions to reflect best practices. Supported options are: “nobs”, “missing”, “mean”, “std_err”, “ci”, “ci”, “std”, “iqr”, “iqr_normal”, “mad”, “mad_normal”, “coef_var”, “range”, “max”, “min”, “skew”, “kurtosis”, “jarque_bera”, “mode”, “freq”, “median”, “percentiles”, “distinct”, “top”, and “freq”. See Notes for details.
numericbool, default True: Whether to include numeric columns in the descriptive statistics.
categoricalbool, default True: Whether to include categorical columns in the descriptive statistics.
alphafloat, default 0.05: A number between 0 and 1 representing the size used to compute the confidence interval, which has coverage 1 - alpha.
use_tbool, default False: Use the Student’s t distribution to construct confidence intervals.
percentilessequence[float]: A distinct sequence of floating point values all between 0 and 100. The default percentiles are 1, 5, 10, 25, 50, 75, 90, 95, 99.
ntopint, default 5: The number of top categorical labels to report. Default is

See also

pandas.DataFrame.describe: Basic descriptive statistics
describe: A simplified version that returns a DataFrame

Notes

The selectable statistics include:

“nobs” - Number of observations
“missing” - Number of missing observations
“mean” - Mean
“std_err” - Standard Error of the mean assuming no correlation
“ci” - Confidence interval with coverage (1 - alpha) using the normal or t. This option creates two entries in any tables: lower_ci and upper_ci.
“std” - Standard Deviation
“iqr” - Interquartile range
“iqr_normal” - Interquartile range relative to a Normal
“mad” - Mean absolute deviation
“mad_normal” - Mean absolute deviation relative to a Normal
“coef_var” - Coefficient of variation
“range” - Range between the maximum and the minimum
“max” - The maximum
“min” - The minimum
“skew” - The skewness defined as the standardized 3rd central moment
“kurtosis” - The kurtosis defined as the standardized 4th central moment
“jarque_bera” - The Jarque-Bera test statistic for normality based on the skewness and kurtosis. This option creates two entries, jarque_bera and jarque_beta_pval.
“mode” - The mode of the data. This option creates two entries in all tables, mode and mode_freq which is the empirical frequency of the modal value.
“median” - The median of the data.
“percentiles” - The percentiles. Values included depend on the input value of percentiles.
“distinct” - The number of distinct categories in a categorical.
“top” - The mode common categories. Labeled top_n for n in 1, 2, …, ntop.
“freq” - The frequency of the common categories. Labeled freq_n for n in 1, 2, …, ntop.

Attributes

numeric_statistics: The list of supported statistics for numeric data
categorical_statistics: The list of supported statistics for categorical data
default_statistics: The default list of statistics

Methods

summary()

Summary table of the descriptive statistics

Methods

summary()

Summary table of the descriptive statistics

Properties

`categorical`	Descriptive statistics for categorical data
`categorical_statistics`
`default_statistics`
`frame`	Descriptive statistics for both numeric and categorical data
`numeric`	Descriptive statistics for numeric data
`numeric_statistics`