statsmodels.stats.descriptivestats.describe¶
-
statsmodels.stats.descriptivestats.
describe
(data: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame], stats: Optional[Sequence[str]] = None, *, numeric: bool = True, categorical: bool = True, alpha: float = 0.05, use_t: bool = False, percentiles: Sequence[Union[int, float]] = (1, 5, 10, 25, 50, 75, 90, 95, 99), ntop: bool = 5) → pandas.core.frame.DataFrame[source]¶ Extended descriptive statistics for data
- Parameters
- dataarray_like
Data to describe. Must be convertible to a pandas DataFrame.
- stats
Sequence
[str
],optional
Statistics to include. If not provided the full set of statistics is computed. This list may evolve across versions to reflect best practices. Supported options are: “nobs”, “missing”, “mean”, “std_err”, “ci”, “ci”, “std”, “iqr”, “iqr_normal”, “mad”, “mad_normal”, “coef_var”, “range”, “max”, “min”, “skew”, “kurtosis”, “jarque_bera”, “mode”, “freq”, “median”, “percentiles”, “distinct”, “top”, and “freq”. See Notes for details.
- numericbool,
default
True
Whether to include numeric columns in the descriptive statistics.
- categoricalbool,
default
True
Whether to include categorical columns in the descriptive statistics.
- alpha
float
,default
0.05 A number between 0 and 1 representing the size used to compute the confidence interval, which has coverage 1 - alpha.
- use_tbool,
default
False
Use the Student’s t distribution to construct confidence intervals.
- percentilessequence[
float
] A distinct sequence of floating point values all between 0 and 100. The default percentiles are 1, 5, 10, 25, 50, 75, 90, 95, 99.
- ntop
int
,default
5 The number of top categorical labels to report. Default is
- Returns
DataFrame
Descriptive statistics
See also
pandas.DataFrame.describe
Basic descriptive statistics
Description
Descriptive statistics class with additional output options
Notes
The selectable statistics include:
“nobs” - Number of observations
“missing” - Number of missing observations
“mean” - Mean
“std_err” - Standard Error of the mean assuming no correlation
“ci” - Confidence interval with coverage (1 - alpha) using the normal or t. This option creates two entries in any tables: lower_ci and upper_ci.
“std” - Standard Deviation
“iqr” - Interquartile range
“iqr_normal” - Interquartile range relative to a Normal
“mad” - Mean absolute deviation
“mad_normal” - Mean absolute deviation relative to a Normal
“coef_var” - Coefficient of variation
“range” - Range between the maximum and the minimum
“max” - The maximum
“min” - The minimum
“skew” - The skewness defined as the standardized 3rd central moment
“kurtosis” - The kurtosis defined as the standardized 4th central moment
“jarque_bera” - The Jarque-Bera test statistic for normality based on the skewness and kurtosis. This option creates two entries, jarque_bera and jarque_beta_pval.
“mode” - The mode of the data. This option creates two entries in all tables, mode and mode_freq which is the empirical frequency of the modal value.
“median” - The median of the data.
“percentiles” - The percentiles. Values included depend on the input value of
percentiles
.“distinct” - The number of distinct categories in a categorical.
“top” - The mode common categories. Labeled top_n for n in 1, 2, …,
ntop
.“freq” - The frequency of the common categories. Labeled freq_n for n in 1, 2, …,
ntop
.