statsmodels.stats.descriptivestats.Description

class statsmodels.stats.descriptivestats.Description(data, stats=None, *, numeric=True, categorical=True, alpha=0.05, use_t=False, percentiles=(1, 5, 10, 25, 50, 75, 90, 95, 99), ntop=5)[source]

Extended descriptive statistics for data

Parameters:
data : array_like

Data to describe. Must be convertible to a pandas DataFrame.

stats : Sequence[str], optional

Statistics to include. If not provided the full set of statistics is computed. This list may evolve across versions to reflect best practices. Supported options are: “nobs”, “missing”, “mean”, “std_err”, “ci”, “ci”, “std”, “iqr”, “iqr_normal”, “mad”, “mad_normal”, “coef_var”, “range”, “max”, “min”, “skew”, “kurtosis”, “jarque_bera”, “mode”, “freq”, “median”, “percentiles”, “distinct”, “top”, and “freq”. See Notes for details.

numeric : bool, default True

Whether to include numeric columns in the descriptive statistics.

categorical : bool, default True

Whether to include categorical columns in the descriptive statistics.

alpha : float, default 0.05

A number between 0 and 1 representing the size used to compute the confidence interval, which has coverage 1 - alpha.

use_t : bool, default False

Use the Student’s t distribution to construct confidence intervals.

percentiles : sequence[float]

A distinct sequence of floating point values all between 0 and 100. The default percentiles are 1, 5, 10, 25, 50, 75, 90, 95, 99.

ntop : int, default 5

The number of top categorical labels to report. Default is

numeric_statistics

The list of supported statistics for numeric data

categorical_statistics

The list of supported statistics for categorical data

default_statistics

The default list of statistics

See also

pandas.DataFrame.describe

Basic descriptive statistics

describe

A simplified version that returns a DataFrame

Notes

The selectable statistics include:

  • “nobs” - Number of observations

  • “missing” - Number of missing observations

  • “mean” - Mean

  • “std_err” - Standard Error of the mean assuming no correlation

  • “ci” - Confidence interval with coverage (1 - alpha) using the normal or t. This option creates two entries in any tables: lower_ci and upper_ci.

  • “std” - Standard Deviation

  • “iqr” - Interquartile range

  • “iqr_normal” - Interquartile range relative to a Normal

  • “mad” - Mean absolute deviation

  • “mad_normal” - Mean absolute deviation relative to a Normal

  • “coef_var” - Coefficient of variation

  • “range” - Range between the maximum and the minimum

  • “max” - The maximum

  • “min” - The minimum

  • “skew” - The skewness defined as the standardized 3rd central moment

  • “kurtosis” - The kurtosis defined as the standardized 4th central moment

  • “jarque_bera” - The Jarque-Bera test statistic for normality based on the skewness and kurtosis. This option creates two entries, jarque_bera and jarque_beta_pval.

  • “mode” - The mode of the data. This option creates two entries in all tables, mode and mode_freq which is the empirical frequency of the modal value.

  • “median” - The median of the data.

  • “percentiles” - The percentiles. Values included depend on the input value of percentiles.

  • “distinct” - The number of distinct categories in a categorical.

  • “top” - The mode common categories. Labeled top_n for n in 1, 2, …, ntop.

  • “freq” - The frequency of the common categories. Labeled freq_n for n in 1, 2, …, ntop.

Methods

summary()

Summary table of the descriptive statistics

Properties

categorical

Descriptive statistics for categorical data

categorical_statistics

default_statistics

frame

Descriptive statistics for both numeric and categorical data

numeric

Descriptive statistics for numeric data

numeric_statistics