Release 0.13.0

Release summary

statsmodels is using github to store the updated documentation. Two version are available:

Warning

API stability is not guaranteed for new features, although even in this case changes will be made in a backwards compatible way if possible. The stability of a new feature depends on how much time it was already in statsmodels main and how much usage it has already seen. If there are specific known problems or limitations, then they are mentioned in the docstrings.

Stats

Issues Closed: 238

Pull Requests Merged: 165

The Highlights

New cross-sectional models

Beta Regression

BetaModel estimates a regression model for dependent variable in the unit interval such as fractions and proportions based on the Beta distribution. The Model is parameterized by mean and precision, where both can depend on explanatory variables through link functions.

Ordinal Regression

statsmodels.miscmodels.ordinal_model.OrderedModel implements cumulative link models for ordinal data, based on Logit, Probit or a userprovided CDF link.

Distributions

Copulas

Statsmodels includes now basic support for mainly bivariate copulas. Currently, 10 copulas are available, Archimedean, elliptical and asymmetric extreme value copulas. CopulaDistribution combines a copula with marginal distributions to create multivariate distributions.

Count distribution based on discretization

DiscretizedCount provides count distributions generated by discretizing continuous distributions available in scipy. The parameters of the distribution can be estimated by maximum likelihood with DiscretizedModel.

Bernstein Distribution

BernsteinDistribution creates nonparametric univariate and multivariate distributions using Bernstein polynomials on a regular grid. This can be used to smooth histograms or approximate distributions on the unit hypercube. When the marginal distributions are uniform, then the BernsteinDistribution is a copula.

Statistics

Brunner Munzel rank comparison

Brunner-Munzel test is nonparametric comparison of two samples and is an extension of Wilcoxon-Mann-Whitney and Fligner-Policello tests that requires only ordinal information without further assumption on the distributions of the samples. Statsmodels provides the Brunner Munzel hypothesis test for stochastic equality in rank_compare_2indep but also confidence intervals and equivalence testing (TOST) for the stochastically larger statistic, also known as Common Language effect size.

Nonparametric

Asymmetric kernels

Asymmetric kernels can nonparametrically estimate density and cumulative distribution function for random variables that have limited support, either unit interval or positive or nonnegative real line. Beta kernels are available for data in the unit interval. The available kernels for positive data are “gamma”, “gamma2”, “bs”, “invgamma”, “invgauss”, “lognorm”, “recipinvgauss” and “weibull” pdf_kernel_asym estimates a kernel density given a bandwidth parameter. cdf_kernel_asym estimates a kernel cdf.

Time series analysis

Autoregressive Distributed Lag Models

ARDL adds support for specifying and estimating ARDL models, and UECM support specifying models in error correction form. ardl_select_order simplifies selecting both AR and DL model orders. bounds_test implements the bounds test of Peseran, Shin and Smith (2001) for testing whether there is a levels relationship without knowing teh orders of integration of the variables.

In [1]: from statsmodels.datasets import danish_data

In [2]: import statsmodels.tsa.api as tsa

In [3]: data = danish_data.load().data

In [4]: sel = tsa.ardl_select_order(data.lrm, 3, data[["lry", "ibo", "ide"]], 3, ic="aic")

In [5]: ardl = sel.model

In [6]: ardl.ardl_order
Out[6]: (3, 1, 3, 2)
In [7]: res = ardl.fit()

In [8]: print(res.summary())
                              ARDL Model Results                              
==============================================================================
Dep. Variable:                    lrm   No. Observations:                   55
Model:               ARDL(3, 1, 3, 2)   Log Likelihood                 139.513
Method:               Conditional MLE   S.D. of innovations              0.017
Date:                Wed, 02 Nov 2022   AIC                           -251.026
Time:                        17:12:49   BIC                           -223.708
Sample:                    10-01-1974   HQIC                          -240.553
                         - 07-01-1987                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.6202      0.568      4.615      0.000       1.472       3.769
lrm.L1         0.3192      0.137      2.336      0.025       0.043       0.596
lrm.L2         0.5326      0.132      4.024      0.000       0.265       0.800
lrm.L3        -0.2687      0.102     -2.631      0.012      -0.475      -0.062
lry.L0         0.6728      0.131      5.129      0.000       0.407       0.938
lry.L1        -0.2574      0.147     -1.749      0.088      -0.555       0.040
ibo.L0        -1.0785      0.322     -3.353      0.002      -1.729      -0.428
ibo.L1        -0.1062      0.586     -0.181      0.857      -1.291       1.079
ibo.L2         0.2877      0.569      0.505      0.616      -0.863       1.439
ibo.L3        -0.9947      0.393     -2.534      0.015      -1.789      -0.201
ide.L0         0.1255      0.554      0.226      0.822      -0.996       1.247
ide.L1        -0.3280      0.721     -0.455      0.652      -1.787       1.131
ide.L2         1.4079      0.552      2.550      0.015       0.291       2.524
==============================================================================
In [9]: uecm = tsa.UECM.from_ardl(ardl)

In [10]: uecm_res = uecm.fit()

In [11]: uecm_res.bounds_test(case=4)
Out[11]: 
BoundsTestResult
Stat: 5.43062
Upper P-value: 0.00339
Lower P-value: 0.000335
Null: No Cointegration
Alternative: Possible Cointegration

Fixed parameters in ARIMA estimators

  • Allow fixing parameters in ARIMA estimator Hannan-Rissanen (hannan_rissanen) through the new fixed_params argument

What’s new - an overview

The following lists the main new features of statsmodels 0.13.0. In addition, release 0.13.0 includes bug fixes, refactorings and improvements in many areas.

Major Feature

  • Allow fixing parameters in ARIMA estimator Hannan-Rissanen (PR #7497, PR #7501)

  • OLS add “slim” option to summary method (PR #7693 based on PR #6880)

  • Add loglog link for use with GLM (PR #7594)

  • improved default derivatives in CDFLink (PR #7287)

  • GLM enhanced and corrected get_distribution (PR #7535)

  • GLMResults info_criteria, add dk_params option to include scale in parameter count (PR #7693)

  • GLMResults add pseudo R-squared, Cox-Snell and McFadden (PR #7682 based on PR #7367)

  • nonparametric: add tricube kernel (PR #7697 based on PR #7671)

Submodules

Documentation

Performance

backport

base

  • Use np.linalg.solve() instead of np.linalg.inv() in Newton-Raphson Algorithm (PR #7429)

  • Allow remove_data to work when an attribute is not implemented (PR #7511)

  • REF/BUG generic likelihood LLRMixin use df_resid instead of df_model for llr_pvalue (PR #7586)

  • Raise when invalid optimization options passed to optimizer (PR #7596)

datasets

  • Add an error message for not found data (PR #7490)

discrete

  • Add discretized count distribution (PR #7488)

  • ZI predict, fix offset default if None, allow exog_infl None if constant (PR #7670)

distributions

  • Copula 7254 rebased (PR #7408)

  • Add discretized count distribution (PR #7488)

  • Random number generation wrapper for rng, qrng (PR #7608)

  • BUG/REF copula another round for 0.13 (PR #7648)

  • Temporarily change the default RNG in check_random_state (PR #7652)

  • More copula improvements for 0.13 (PR #7723)

docs

  • Fix for upstream changes in PyMC3 notebook (PR #7416)

  • Correct small typo in Theta model Notebook (PR #7450)

  • Prevent indent running on None (PR #7462)

  • Update versions file (PR #7708)

  • Improve docs and docstrings, mainly for recent additions (PR #7727)

  • Api.py, docstring improvements (PR #7732)

  • Add to release notes, smaller doc fixes, references (PR #7743)

genmod

  • Change default derivative in CDFLink (PR #7287)

  • Allow user to configure GEE qic (PR #7471)

  • Score and Hessian for Tweedie models (PR #7489)

  • BUG/ENH fix and enh GLM, family get_distribution (PR #7535)

  • Enh glm loglog (PR #7594)

  • McFadden and Cox&Snell Pseudo R squared to GLMResults (PR #7682)

  • Add dk_params option to GLM info_criteria (PR #7693)

  • Warn kwargs glm (PR #7750)

  • GLM init invalid kwargs use ValueWarning (PR #7751)

graphics

  • Fix UserWarning: marker is redundantly defined (Matplotlib v 3.4.1) (PR #7400)

  • Fix axis labels in qqplots (PR #7413)

  • Remove typo in plot_pacf example (PR #7514)

  • Start process of changing default in plot-pacf (PR #7582)

  • Improve limit format in diff plot (PR #7592)

  • Clarify which series is on x-axis (PR #7612)

  • Graphics.plot_partregress add eval_env options (PR #7673)

io

  • Add support for pickling for generic path-like objects (PR #7581)

  • Fix summary().as_latex, line in top table dropped (PR #7748)

maintenance

multivariate

  • Multivariate - Return E and H matrices in dict (PR #5491)

  • Added the option full_matrices=False in the PCA method (PR #7329)

  • Factor fit ml em resets seed (rebased) (PR #7703)

  • Correct MultivariateTestResults doc string (PR #7735)

  • Correct MultivariateTestResults doc string (PR #7738)

  • Add missing function doc head (PR #7740)

nonparametric

othermod

  • Betareg rebased3 Beta regression (PR #7543)

  • REF/BUG generic likelihood LLRMixin use df_resid instead of df_model for llr_pvalue (PR #7586)

  • Oaxaca Variance/Other Models (PR #7713)

regression

  • Allow remove_data to work when an attribute is not implemented (PR #7511)

  • Fix scale parameter in elastic net (PR #7571)

  • Regression, allow remove_data to remove wendog, wexog, wresid (PR #7595)

  • Spelling error in docs fixed (PR #7618)

  • Add dk_params option to GLM info_criteria (PR #7693)

  • Quantile regression use dimension of x matrix rather than rank (PR #7694)

  • Add option for slim summary in OLS results (PR #7696)

  • Enable VIF to work with DataFrames (PR #7704)

stats

  • Runs test numeric cutoff error (PR #7422)

  • Resolve TODO in proportion.py (PR #7515)

  • Improve sidak multipletest precision close to zero (PR #7668)

  • Proportions_chisquare prevent integer overflow (PR #7669)

  • Fix lilliefors results for single-column DataFrames (PR #7698)

  • Describe / Description do not return percentiles (PR #7710)

  • ENH: add options to meta-analysis plot_forest (PR #7772)

tools

tsa

  • Add Helper function to solve for polynomial coefficients from roots for ARIMA (PR #6921)

  • Changed month abbreviations with localization (PR #7409)

  • Add ARDL model (PR #7433)

  • Fix typo in ets error (PR #7435)

  • Add fixed_params to Hannan Rissanen (GH7202) (PR #7497)

  • Enable ARIMA.fit(method=’hannan_rissanen’) with fixed parameters (GH7501) (PR #7502)

  • Fix errors when making dynamic forecasts (PR #7516)

  • Correct index location of seasonal (PR #7545)

  • Handle non-date index with a freq (PR #7574)

  • Start process of changing default in plot-pacf (PR #7582)

  • Correct docstring (PR #7587)

  • Let VAR results complete when model has perfect fit (PR #7588)

  • Rename nc to n everywhere (PR #7593)

  • Improve ARDL and documentation (PR #7611)

  • Add RUR stationarity test to statsmodels.tsa.stattools (PR #7616)

  • Improve ARDL and UECM (PR #7619)

  • Improve error message in seasonal for bad freq (PR #7643)

  • ENH Fixed Range Unit-Root critical values (PR #7645)

  • Add SARIMAX FAQ (PR #7656)

  • Add to the SARIMAX FAQ (PR #7659)

  • Improve SARIMAX FAQ Notebook (PR #7661)

  • Improve ARIMA documentation (PR #7662)

  • Update TSA Api (PR #7701)

  • Correct ArmaProcess.from_estimation (PR #7709)

  • Added fft to ccovf and ccf (PR #7721)

tsa.statespace

  • Port missed doc fix (PR #7123)

  • Forecast after extend w/ time varying matrix (PR #7437)

  • Specify impulse to impulse_responses in VARMAX notebook (PR #7475)

  • Column name can be passed as an argument in impulse_responses in VARMAX (PR #7506)

  • Statespace MLEModel false validation error with nested fix_params (GH7507) (PR #7508)

  • Ensure attributes exist (PR #7538)

  • Ensure warning does not raise (PR #7589)

  • Assert correct iloc dtypes (PR #7737)

tsa.vector.ar

  • Fix float index usage in IRF error bands (PR #7397)

  • Add error if too few values (PR #7591)

bug-wrong

A new issue label type-bug-wrong indicates bugs that cause that incorrect numbers are returned without warnings. (Regular bugs are mostly usability bugs or bugs that raise an exception for unsupported use cases.) see tagged issues

Major Bugs Fixed

See github issues for a list of bug fixes included in this release

Development summary and credits

Besides receiving contributions for new and improved features and for bugfixes, important contributions to general maintenance for this release came from

  • Chad Fulton

  • Brock Mendel

  • Peter Quackenbush

  • Kerby Shedden

  • Kevin Sheppard

and the general maintainer and code reviewer

  • Josef Perktold

Additionally, many users contributed by participation in github issues and providing feedback.

Thanks to all of the contributors for the 0.13.0 release (based on git log):

  • Aidan Russell

  • Alexander Stiebing

  • Austin Adams

  • Ben Greiner

  • Brent Pedersen

  • Chad Fulton

  • Chadwick Boulay

  • Edwin Rijgersberg

  • Ezequiel Smucler

      1. Mcbain

  • Graham Inggs

  • Greg Mcmahan

  • Helder Oliveira

  • Hsiao Yi

  • Jack Liu

  • Jake Jiacheng Liu

  • Jeremy Bejarano

  • Joris Van Den Bossche

  • Josef Perktold

  • Juan Orduz

  • Kerby Shedden

  • Kevin Sheppard

  • Luke Gregor

  • Malte Zietlow

  • Masanori Kanazu

  • Max Mahlke

  • Michele Fortunato

  • Mike Ovyan

  • Min Rk

  • Natalie Heer

  • Nikolai Korolev

  • Omar Gutiérrez

  • Oswaldo

  • Pamphile Roy

  • Pratyush Sharan

  • Roberto Nunes Mourão

  • Simardeep27

  • Simon Høxbro Hansen

  • Sin Kim

  • Skipper Seabold

  • Stefan Appelhoff

  • Thomas Brooks

  • Tomohiro Endo

  • Wahram Andrikyan

  • cxan96

  • janosbiro

  • partev

  • w31ha0

These lists of names are automatically generated based on git log, and may not be complete.

Merged Pull Requests

The following Pull Requests were merged since the last release:

  • PR #5491: ENH: multivariate - Return E and H matrices in dict

  • PR #6921: ENH: Add Helper function to solve for polynomial coefficients from roots for ARIMA

  • PR #7121: MAINT: v0.12.1 backports

  • PR #7123: DOC: Port missed doc fix

  • PR #7221: MAINT: Backport fixes for 0.12.2 compat release

  • PR #7222: Backports

  • PR #7287: REF: change default derivative in CDFLink

  • PR #7291: Backports

  • PR #7293: Rls note

  • PR #7303: DOC: Minor updates to v0.12.2 release notes

  • PR #7329: ENH: Added the option full_matrices=False in the PCA method

  • PR #7395: DOC: update doc for tweedie allowed links

  • PR #7397: BUG: Fix float index usage in IRF error bands

  • PR #7399: DOC: Don’t point to release version

  • PR #7400: MAINT: Fix UserWarning: marker is redundantly defined (Matplotlib v 3.4.1)

  • PR #7402: DOC: fixed error in linear mixed effects example

  • PR #7404: MAINT: Fix descriptive stats with extension dtypes

  • PR #7405: MAINT: Fix pip pre test failures

  • PR #7406: MAINT: Fix README badges

  • PR #7408: Copula 7254 rebased

  • PR #7409: ENH: changed month abbreviations with localization

  • PR #7413: BUG: Fix axis labels in qqplots

  • PR #7416: MAINT: Fix for upstream changes in PyMC3 notebook

  • PR #7422: BUG: Runs test numeric cutoff error

  • PR #7423: DOC/MAINT: Remove redundant words in PCA docstring

  • PR #7425: MAINT: Silence warnings and future compat

  • PR #7426: DOC: misc fixes in docstr of fdrcorrection

  • PR #7429: ENH: Use np.linalg.solve() instead of np.linalg.inv() in Newton-Raphson Algorithm

  • PR #7432: MAINT: Use loadscope to avoid rerunning setup

  • PR #7433: ENH: Add ARDL model

  • PR #7434: DOC: Small doc fixes

  • PR #7435: fix typo in ets error

  • PR #7437: BUG: forecast after extend w/ time varying matrix

  • PR #7438: MAINT: Remove cyclic import risks

  • PR #7450: Correct small typo in Theta model Notebook

  • PR #7458: DOC: typo, plats->plots

  • PR #7462: BUG: Prevent indent running on None

  • PR #7471: ENH: Allow user to configure GEE qic

  • PR #7474: MAINT: Fit future and deprecation warnings

  • PR #7475: Specify impulse to impulse_responses in VARMAX notebook

  • PR #7488: ENH: add discretized count distribution

  • PR #7489: BUG: score and Hessian for Tweedie models

  • PR #7490: ENH: Add an error message for not found data

  • PR #7495: MAINT: Avoid future issues in pandas

  • PR #7497: ENH: Add fixed_params to Hannan Rissanen (GH7202)

  • PR #7502: ENH: Enable ARIMA.fit(method=’hannan_rissanen’) with fixed parameters (GH7501)

  • PR #7506: ENH: Column name can be passed as an argument in impulse_responses in VARMAX

  • PR #7508: BUG: statespace MLEModel false validation error with nested fix_params (GH7507)

  • PR #7511: Allow remove_data to work when an attribute is not implemented

  • PR #7514: Remove typo in plot_pacf example

  • PR #7515: resolve TODO in proportion.py

  • PR #7516: BUG: Fix errors when making dynamic forecasts

  • PR #7535: BUG/ENH fix and enh GLM, family get_distribution

  • PR #7536: MAINT: Remove 32-bit testing

  • PR #7538: BUG: Ensure attributes exist

  • PR #7539: DOC: Fix errors in theta notebook

  • PR #7540: MAINT: Add github actions to build docs

  • PR #7541: MAINT: Fix GH actions

  • PR #7543: Betareg rebased3 Beta regression

  • PR #7545: BUG: Correct index location of seasonal

  • PR #7546: MAINT: Fix contrasts for Pandas changes

  • PR #7547: MAINT: Correct example implementation

  • PR #7551: MAINT: Check push ability

  • PR #7552: MAINT: Continue working on it

  • PR #7553: MAINT: Continue working on push ability

  • PR #7554: MAINT: Continue working on push ability

  • PR #7555: MAINT: Finalize push ability

  • PR #7556: MAINT: Finalize push ability

  • PR #7557: MAINT: Finalize push ability

  • PR #7558: MAINT: Finalize push ability

  • PR #7559: MAINT: Get doc push to work

  • PR #7560: MAINT: Get doc push to work

  • PR #7561: MAINT: Get doc push to work

  • PR #7571: BUG: Fix scale parameter in elastic net

  • PR #7572: DOC: Improve rolling OLS notebook

  • PR #7574: BUG: Handle non-date index with a freq

  • PR #7575: MAINT: Remove deprecated functions

  • PR #7577: MAINT: Remove additional deprecated features

  • PR #7578: MAINT: Remove recarray

  • PR #7579: MAINT: Remove deprecated code

  • PR #7580: MAINT: Correct notebooks for deprecations

  • PR #7581: ENH: Add support for pickling for generic path-like objects

  • PR #7582: ENH: Start process of changing default in plot-pacf

  • PR #7583: MAINT: Fix spelling errors

  • PR #7586: REF/BUG generic likelihood LLRMixin use df_resid instead of df_model for llr_pvalue

  • PR #7587: DOC: Correct docstring

  • PR #7588: BUG: Let VAR results complete when model has perfect fit

  • PR #7589: BUG: Ensure warning does not raise

  • PR #7590: MAINT: Clarify minimum versions

  • PR #7591: ENH: Add error if too few values

  • PR #7592: ENH: Improve limit format in diff plot

  • PR #7593: MAINT: Rename nc to n everywhere

  • PR #7594: Enh glm loglog

  • PR #7595: BUG: regression, allow remove_data to remove wendog, wexog, wresid

  • PR #7596: ENH: Raise when invalid optimization options passed to optimizer

  • PR #7599: MAINT: Revert exception to warning

  • PR #7607: DOC: copula in user guide and examples

  • PR #7608: ENH: random number generation wrapper for rng, qrng

  • PR #7611: ENH: Improve ARDL and documentation

  • PR #7612: BUG/DOC: Clarify which series is on x-axis

  • PR #7614: DOC: Small clean of example

  • PR #7616: ENH: Add RUR stationarity test to statsmodels.tsa.stattools

  • PR #7617: MAINT: Silence future warnings

  • PR #7618: DOC: spelling error in docs fixed

  • PR #7619: ENH: Improve ARDL and UECM

  • PR #7620: MAINT: Avoid passing bad optimization param

  • PR #7641: MAINT: Pin matplotlib

  • PR #7643: ENH: Improve error message in seasonal for bad freq

  • PR #7644: DOC: Update dev page flake8 command to follow PULL_REQUEST_TEMPLATE.md

  • PR #7645: ENH Fixed Range Unit-Root critical values

  • PR #7648: BUG/REF copula another round for 0.13

  • PR #7649: MAINT: Modernize prediction in notebooks

  • PR #7651: ENH: Improve copula notebook

  • PR #7652: MAINT: Temporarily change the default RNG in check_random_state

  • PR #7656: DOC: Add SARIMAX FAQ

  • PR #7659: DOC: Add to the SARIMAX FAQ

  • PR #7661: DOC: Improve SARIMAX FAQ Notebook

  • PR #7662: DOC: Improve ARIMA documentation

  • PR #7668: BUG: improve sidak multipletest precision close to zero

  • PR #7669: BUG: proportions_chisquare prevent integer overflow

  • PR #7670: BUG: ZI predict, fix offset default if None, allow exog_infl None if constant

  • PR #7673: ENH/BUG: graphics.plot_partregress add eval_env options

  • PR #7676: DOC: Remove duplication methods section

  • PR #7677: DOC: Second try ixing duplicate methods

  • PR #7681: fix a typo

  • PR #7682: ENH: McFadden and Cox&Snell Pseudo R squared to GLMResults

  • PR #7685: MAINT: Protect against changes in numeric indexes

  • PR #7693: ENH: add dk_params option to GLM info_criteria

  • PR #7694: ENH: quantile regression use dimension of x matrix rather than rank

  • PR #7696: ENH: add option for slim summary in OLS results

  • PR #7697: ENH add tricube kernel

  • PR #7698: ENH: Fix lilliefors results for single-column DataFrames

  • PR #7699: DOC: Improve ARDL notebook

  • PR #7701: MAINT: Update TSA Api

  • PR #7702: DOC: Update versions.json

  • PR #7703: BUG: Factor fit ml em resets seed (rebased)

  • PR #7704: ENH: Enable VIF to work with DataFrames

  • PR #7708: MAINT: Update versions file

  • PR #7709: BUG: Correct ArmaProcess.from_estimation

  • PR #7710: BUG: describe / Description do not return percentiles

  • PR #7713: ENH: Oaxaca Variance/Other Models

  • PR #7714: DOC: Update release note

  • PR #7721: ENH: Added fft to ccovf and ccf

  • PR #7723: REF/ENH: more copula improvements for 0.13

  • PR #7726: DOC: Update release note

  • PR #7727: DOC: improve docs and docstrings, mainly for recent additions

  • PR #7732: DOC: api.py, docstring improvements

  • PR #7735: DOC: Correct MultivariateTestResults doc string

  • PR #7737: TST: Assert correct iloc dtypes

  • PR #7738: DOC: Correct MultivariateTestResults doc string

  • PR #7739: MAINT: Fix style issue

  • PR #7740: DOC: add missing function doc head

  • PR #7742: MAINT: Final issues in __all__

  • PR #7743: DOC: add to release notes, smaller doc fixes, references

  • PR #7744: MAINT: Fix hard to reach errors

  • PR #7748: BUG: fix summary().as_latex, line in top table dropped

  • PR #7750: ENH: Warn kwargs glm

  • PR #7751: REF: GLM init invalid kwargs use ValueWarning

  • PR #7757: BUG/MAINT/DOC: more 0.13

  • PR #7766: BUG: fix lowess spikes/nans from epsilon values

  • PR #7768: PERF/TST: Improve Lowess

  • PR #7770: DOC: Fix lowess notebook

  • PR #7772: ENH: add options to meta-analysis plot_forest