Welcome to Statsmodels’s Documentation¶
statsmodels
is a Python module that provides classes and functions for the estimation
of many different statistical models, as well as for conducting statistical tests, and statistical
data exploration. An extensive list of result statistics are available for each estimator.
The results are tested against existing statistical packages to ensure that they are correct. The
package is released under the open source Modified BSD (3-clause) license.
The online documentation is hosted at statsmodels.org.
Minimal Examples¶
Since version 0.5.0
of statsmodels
, you can use R-style formulas
together with pandas
data frames to fit your models. Here is a simple
example using ordinary least squares:
In [1]: import numpy as np
In [2]: import statsmodels.api as sm
In [3]: import statsmodels.formula.api as smf
# Load data
In [4]: dat = sm.datasets.get_rdataset("Guerry", "HistData").data
# Fit regression model (using the natural log of one of the regressors)
In [5]: results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
# Inspect the results
In [6]: print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Lottery R-squared: 0.348
Model: OLS Adj. R-squared: 0.333
Method: Least Squares F-statistic: 22.20
Date: Tue, 28 Feb 2017 Prob (F-statistic): 1.90e-08
Time: 21:38:05 Log-Likelihood: -379.82
No. Observations: 86 AIC: 765.6
Df Residuals: 83 BIC: 773.0
Df Model: 2
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------
Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424
==============================================================================
Omnibus: 3.713 Durbin-Watson: 2.019
Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
Skew: -0.487 Prob(JB): 0.183
Kurtosis: 3.003 Cond. No. 702.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
You can also use numpy
arrays instead of formulas:
In [7]: import numpy as np
In [8]: import statsmodels.api as sm
# Generate artificial data (2 regressors + constant)
In [9]: nobs = 100
In [10]: X = np.random.random((nobs, 2))
In [11]: X = sm.add_constant(X)
In [12]: beta = [1, .1, .5]
In [13]: e = np.random.random(nobs)
In [14]: y = np.dot(X, beta) + e
# Fit regression model
In [15]: results = sm.OLS(y, X).fit()
# Inspect the results
In [16]: print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.260
Model: OLS Adj. R-squared: 0.245
Method: Least Squares F-statistic: 17.06
Date: Tue, 28 Feb 2017 Prob (F-statistic): 4.49e-07
Time: 21:38:05 Log-Likelihood: -23.039
No. Observations: 100 AIC: 52.08
Df Residuals: 97 BIC: 59.89
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.3622 0.088 15.521 0.000 1.188 1.536
x1 0.2220 0.112 1.973 0.051 -0.001 0.445
x2 0.6277 0.112 5.585 0.000 0.405 0.851
==============================================================================
Omnibus: 38.171 Durbin-Watson: 1.957
Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.373
Skew: 0.079 Prob(JB): 0.0413
Kurtosis: 1.773 Cond. No. 5.71
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Have a look at dir(results) to see available results. Attributes are described in results.__doc__ and results methods have their own docstrings.
Citation¶
When using statsmodels in scientific publication, please consider using the following citation:
Seabold, Skipper, and Josef Perktold. “Statsmodels: Econometric and statistical modeling with python.” Proceedings of the 9th Python in Science Conference. 2010.
Bibtex entry:
@inproceedings{seabold2010statsmodels,
title={Statsmodels: Econometric and statistical modeling with python},
author={Seabold, Skipper and Perktold, Josef},
booktitle={9th Python in Science Conference},
year={2010},
}
Basic Documentation¶
Information about the structure and development of statsmodels:
Table of Contents¶
- Linear Regression
- Generalized Linear Models
- Generalized Estimating Equations
- Robust Linear Models
- Linear Mixed Effects Models
- Regression with Discrete Dependent Variable
- Examples
- Technical Documentation
- Module Reference
- statsmodels.discrete.discrete_model.Logit
- statsmodels.discrete.discrete_model.Probit
- statsmodels.discrete.discrete_model.MNLogit
- statsmodels.discrete.discrete_model.Poisson
- statsmodels.discrete.discrete_model.NegativeBinomial
- statsmodels.discrete.discrete_model.LogitResults
- statsmodels.discrete.discrete_model.ProbitResults
- statsmodels.discrete.discrete_model.CountResults
- statsmodels.discrete.discrete_model.MultinomialResults
- statsmodels.discrete.discrete_model.NegativeBinomialResults
- statsmodels.discrete.discrete_model.DiscreteModel
- statsmodels.discrete.discrete_model.DiscreteResults
- statsmodels.discrete.discrete_model.BinaryModel
- statsmodels.discrete.discrete_model.BinaryResults
- statsmodels.discrete.discrete_model.CountModel
- statsmodels.discrete.discrete_model.MultinomialModel
- ANOVA
- Time Series analysis
tsa
- Descriptive Statistics and Tests
- statsmodels.tsa.stattools.acovf
- statsmodels.tsa.stattools.acf
- statsmodels.tsa.stattools.pacf
- statsmodels.tsa.stattools.pacf_yw
- statsmodels.tsa.stattools.pacf_ols
- statsmodels.tsa.stattools.ccovf
- statsmodels.tsa.stattools.ccf
- statsmodels.tsa.stattools.periodogram
- statsmodels.tsa.stattools.adfuller
- statsmodels.tsa.stattools.kpss
- statsmodels.tsa.stattools.coint
- statsmodels.tsa.stattools.bds
- statsmodels.tsa.stattools.q_stat
- statsmodels.tsa.stattools.grangercausalitytests
- statsmodels.tsa.stattools.levinson_durbin
- statsmodels.tsa.stattools.arma_order_select_ic
- statsmodels.tsa.x13.x13_arima_select_order
- statsmodels.tsa.x13.x13_arima_analysis
- Estimation
- Vector Autogressive Processes (VAR)
- Regime switching models
- ARMA Process
- statsmodels.tsa.arima_process.ArmaProcess
- statsmodels.tsa.arima_process.ar2arma
- statsmodels.tsa.arima_process.arma2ar
- statsmodels.tsa.arima_process.arma2ma
- statsmodels.tsa.arima_process.arma_acf
- statsmodels.tsa.arima_process.arma_acovf
- statsmodels.tsa.arima_process.arma_generate_sample
- statsmodels.tsa.arima_process.arma_impulse_response
- statsmodels.tsa.arima_process.arma_pacf
- statsmodels.tsa.arima_process.arma_periodogram
- statsmodels.tsa.arima_process.deconvolve
- statsmodels.tsa.arima_process.index2lpol
- statsmodels.tsa.arima_process.lpol2index
- statsmodels.tsa.arima_process.lpol_fiar
- statsmodels.tsa.arima_process.lpol_fima
- statsmodels.tsa.arima_process.lpol_sdiff
- statsmodels.sandbox.tsa.fftarma.ArmaFft
- Time Series Filters
- statsmodels.tsa.filters.bk_filter.bkfilter
- statsmodels.tsa.filters.hp_filter.hpfilter
- statsmodels.tsa.filters.cf_filter.cffilter
- statsmodels.tsa.filters.filtertools.convolution_filter
- statsmodels.tsa.filters.filtertools.recursive_filter
- statsmodels.tsa.filters.filtertools.miso_lfilter
- statsmodels.tsa.filters.filtertools.fftconvolve3
- statsmodels.tsa.filters.filtertools.fftconvolveinv
- statsmodels.tsa.seasonal.seasonal_decompose
- TSA Tools
- VARMA Process
- Interpolation
- Descriptive Statistics and Tests
- Time Series Analysis by State Space Methods
statespace
- Example: AR(2) model
- Seasonal Autoregressive Integrated Moving-Average with eXogenous regressors (SARIMAX)
- Unobserved Components
- Vector Autoregressive Moving-Average with eXogenous regressors (VARMAX)
- Dynamic Factor Models
- Custom state space models
- State space representation and Kalman filtering
- statsmodels.tsa.statespace.representation.Representation
- statsmodels.tsa.statespace.representation.FrozenRepresentation
- statsmodels.tsa.statespace.kalman_filter.KalmanFilter
- statsmodels.tsa.statespace.kalman_filter.FilterResults
- statsmodels.tsa.statespace.kalman_smoother.KalmanSmoother
- statsmodels.tsa.statespace.kalman_smoother.SmootherResults
- Statespace diagnostics
- Statespace Tools
- Methods for Survival and Duration Analysis
- Statistics
stats
- Residual Diagnostics and Specification Tests
- statsmodels.stats.stattools.durbin_watson
- statsmodels.stats.stattools.jarque_bera
- statsmodels.stats.stattools.omni_normtest
- statsmodels.stats.stattools.medcouple
- statsmodels.stats.stattools.robust_skewness
- statsmodels.stats.stattools.robust_kurtosis
- statsmodels.stats.stattools.expected_robust_kurtosis
- statsmodels.stats.diagnostic.acorr_ljungbox
- statsmodels.stats.diagnostic.acorr_breusch_godfrey
- statsmodels.stats.diagnostic.HetGoldfeldQuandt
- statsmodels.stats.diagnostic.het_goldfeldquandt
- statsmodels.stats.diagnostic.het_breuschpagan
- statsmodels.stats.diagnostic.het_white
- statsmodels.stats.diagnostic.het_arch
- statsmodels.stats.diagnostic.linear_harvey_collier
- statsmodels.stats.diagnostic.linear_rainbow
- statsmodels.stats.diagnostic.linear_lm
- statsmodels.stats.diagnostic.breaks_cusumolsresid
- statsmodels.stats.diagnostic.breaks_hansen
- statsmodels.stats.diagnostic.recursive_olsresiduals
- statsmodels.stats.diagnostic.CompareCox
- statsmodels.stats.diagnostic.compare_cox
- statsmodels.stats.diagnostic.CompareJ
- statsmodels.stats.diagnostic.compare_j
- statsmodels.stats.diagnostic.unitroot_adf
- statsmodels.stats.diagnostic.normal_ad
- statsmodels.stats.diagnostic.kstest_normal
- statsmodels.stats.diagnostic.lilliefors
- Outliers and influence measures
- Sandwich Robust Covariances
- statsmodels.stats.sandwich_covariance.cov_hac
- statsmodels.stats.sandwich_covariance.cov_nw_panel
- statsmodels.stats.sandwich_covariance.cov_nw_groupsum
- statsmodels.stats.sandwich_covariance.cov_cluster
- statsmodels.stats.sandwich_covariance.cov_cluster_2groups
- statsmodels.stats.sandwich_covariance.cov_white_simple
- statsmodels.stats.sandwich_covariance.cov_hc0
- statsmodels.stats.sandwich_covariance.cov_hc1
- statsmodels.stats.sandwich_covariance.cov_hc2
- statsmodels.stats.sandwich_covariance.cov_hc3
- statsmodels.stats.sandwich_covariance.se_cov
- Goodness of Fit Tests and Measures
- Non-Parametric Tests
- statsmodels.sandbox.stats.runs.mcnemar
- statsmodels.sandbox.stats.runs.symmetry_bowker
- statsmodels.sandbox.stats.runs.median_test_ksample
- statsmodels.sandbox.stats.runs.runstest_1samp
- statsmodels.sandbox.stats.runs.runstest_2samp
- statsmodels.sandbox.stats.runs.cochrans_q
- statsmodels.sandbox.stats.runs.Runs
- statsmodels.stats.descriptivestats.sign_test
- Interrater Reliability and Agreement
- Multiple Tests and Multiple Comparison Procedures
- statsmodels.sandbox.stats.multicomp.multipletests
- statsmodels.sandbox.stats.multicomp.fdrcorrection0
- statsmodels.sandbox.stats.multicomp.GroupsStats
- statsmodels.sandbox.stats.multicomp.MultiComparison
- statsmodels.sandbox.stats.multicomp.TukeyHSDResults
- statsmodels.stats.multicomp.pairwise_tukeyhsd
- statsmodels.stats.multitest.local_fdr
- statsmodels.stats.multitest.fdrcorrection_twostage
- statsmodels.stats.multitest.NullDistribution
- statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unbalanced
- statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unequal
- statsmodels.sandbox.stats.multicomp.varcorrection_unbalanced
- statsmodels.sandbox.stats.multicomp.varcorrection_unequal
- statsmodels.sandbox.stats.multicomp.StepDown
- statsmodels.sandbox.stats.multicomp.catstack
- statsmodels.sandbox.stats.multicomp.ccols
- statsmodels.sandbox.stats.multicomp.compare_ordered
- statsmodels.sandbox.stats.multicomp.distance_st_range
- statsmodels.sandbox.stats.multicomp.ecdf
- statsmodels.sandbox.stats.multicomp.get_tukeyQcrit
- statsmodels.sandbox.stats.multicomp.homogeneous_subsets
- statsmodels.sandbox.stats.multicomp.maxzero
- statsmodels.sandbox.stats.multicomp.maxzerodown
- statsmodels.sandbox.stats.multicomp.mcfdr
- statsmodels.sandbox.stats.multicomp.qcrit
- statsmodels.sandbox.stats.multicomp.randmvn
- statsmodels.sandbox.stats.multicomp.rankdata
- statsmodels.sandbox.stats.multicomp.rejectionline
- statsmodels.sandbox.stats.multicomp.set_partition
- statsmodels.sandbox.stats.multicomp.set_remove_subs
- statsmodels.sandbox.stats.multicomp.tiecorrect
- Basic Statistics and t-Tests with frequency weights
- statsmodels.stats.weightstats.DescrStatsW
- statsmodels.stats.weightstats.CompareMeans
- statsmodels.stats.weightstats.ttest_ind
- statsmodels.stats.weightstats.ttost_ind
- statsmodels.stats.weightstats.ttost_paired
- statsmodels.stats.weightstats.ztest
- statsmodels.stats.weightstats.ztost
- statsmodels.stats.weightstats.zconfint
- statsmodels.stats.weightstats._tconfint_generic
- statsmodels.stats.weightstats._tstat_generic
- statsmodels.stats.weightstats._zconfint_generic
- statsmodels.stats.weightstats._zstat_generic
- statsmodels.stats.weightstats._zstat_generic2
- Power and Sample Size Calculations
- statsmodels.stats.power.TTestIndPower
- statsmodels.stats.power.TTestPower
- statsmodels.stats.power.GofChisquarePower
- statsmodels.stats.power.NormalIndPower
- statsmodels.stats.power.FTestAnovaPower
- statsmodels.stats.power.FTestPower
- statsmodels.stats.power.tt_solve_power
- statsmodels.stats.power.tt_ind_solve_power
- statsmodels.stats.power.zt_ind_solve_power
- Proportion
- statsmodels.stats.proportion.proportion_confint
- statsmodels.stats.proportion.proportion_effectsize
- statsmodels.stats.proportion.binom_test
- statsmodels.stats.proportion.binom_test_reject_interval
- statsmodels.stats.proportion.binom_tost
- statsmodels.stats.proportion.binom_tost_reject_interval
- statsmodels.stats.proportion.multinomial_proportions_confint
- statsmodels.stats.proportion.proportions_ztest
- statsmodels.stats.proportion.proportions_ztost
- statsmodels.stats.proportion.proportions_chisquare
- statsmodels.stats.proportion.proportions_chisquare_allpairs
- statsmodels.stats.proportion.proportions_chisquare_pairscontrol
- statsmodels.stats.proportion.proportion_effectsize
- statsmodels.stats.proportion.power_binom_tost
- statsmodels.stats.proportion.power_ztost_prop
- statsmodels.stats.proportion.samplesize_confint_proportion
- Moment Helpers
- statsmodels.stats.correlation_tools.corr_clipped
- statsmodels.stats.correlation_tools.corr_nearest
- statsmodels.stats.correlation_tools.corr_nearest_factor
- statsmodels.stats.correlation_tools.corr_thresholded
- statsmodels.stats.correlation_tools.cov_nearest
- statsmodels.stats.correlation_tools.cov_nearest_factor_homog
- statsmodels.stats.correlation_tools.FactoredPSDMatrix
- statsmodels.stats.moment_helpers.cum2mc
- statsmodels.stats.moment_helpers.mc2mnc
- statsmodels.stats.moment_helpers.mc2mvsk
- statsmodels.stats.moment_helpers.mnc2cum
- statsmodels.stats.moment_helpers.mnc2mc
- statsmodels.stats.moment_helpers.mnc2mvsk
- statsmodels.stats.moment_helpers.mvsk2mc
- statsmodels.stats.moment_helpers.mvsk2mnc
- statsmodels.stats.moment_helpers.cov2corr
- statsmodels.stats.moment_helpers.corr2cov
- statsmodels.stats.moment_helpers.se_cov
- Mediation Analysis
- Residual Diagnostics and Specification Tests
- Nonparametric Methods
nonparametric
- Kernel density estimation
- Kernel regression
- References
- Module Reference
- statsmodels.nonparametric.smoothers_lowess.lowess
- statsmodels.nonparametric.kde.KDEUnivariate
- statsmodels.nonparametric.kernel_density.KDEMultivariate
- statsmodels.nonparametric.kernel_density.KDEMultivariateConditional
- statsmodels.nonparametric.kernel_density.EstimatorSettings
- statsmodels.nonparametric.kernel_regression.KernelReg
- statsmodels.nonparametric.kernel_regression.KernelCensoredReg
- statsmodels.nonparametric.bandwidths.bw_scott
- statsmodels.nonparametric.bandwidths.bw_silverman
- statsmodels.nonparametric.bandwidths.select_bandwidth
- Generalized Method of Moments
gmm
- Module Reference
- statsmodels.sandbox.regression.gmm.GMM
- statsmodels.sandbox.regression.gmm.GMMResults
- statsmodels.sandbox.regression.gmm.IV2SLS
- statsmodels.sandbox.regression.gmm.IVGMM
- statsmodels.sandbox.regression.gmm.IVGMMResults
- statsmodels.sandbox.regression.gmm.IVRegressionResults
- statsmodels.sandbox.regression.gmm.LinearIVGMM
- statsmodels.sandbox.regression.gmm.NonlinearIVGMM
- Module Reference
- Contingency tables
- Multiple Imputation with Chained Equations
- Multivariate Statistics
multivariate
- Empirical Likelihood
emplike
- Other Models
miscmodels
- Distributions
- Empirical Distributions
- Distribution Extras
- statsmodels.sandbox.distributions.extras.SkewNorm_gen
- statsmodels.sandbox.distributions.extras.SkewNorm2_gen
- statsmodels.sandbox.distributions.extras.ACSkewT_gen
- statsmodels.sandbox.distributions.extras.skewnorm2
- statsmodels.sandbox.distributions.extras.pdf_moments_st
- statsmodels.sandbox.distributions.extras.pdf_mvsk
- statsmodels.sandbox.distributions.extras.pdf_moments
- statsmodels.sandbox.distributions.extras.NormExpan_gen
- statsmodels.sandbox.distributions.extras.mvstdnormcdf
- statsmodels.sandbox.distributions.extras.mvnormcdf
- Univariate Distributions by non-linear Transformations
- statsmodels.sandbox.distributions.transformed.TransfTwo_gen
- statsmodels.sandbox.distributions.transformed.Transf_gen
- statsmodels.sandbox.distributions.transformed.ExpTransf_gen
- statsmodels.sandbox.distributions.transformed.LogTransf_gen
- statsmodels.sandbox.distributions.transformed.SquareFunc
- statsmodels.sandbox.distributions.transformed.absnormalg
- statsmodels.sandbox.distributions.transformed.invdnormalg
- statsmodels.sandbox.distributions.transformed.loggammaexpg
- statsmodels.sandbox.distributions.transformed.lognormalg
- statsmodels.sandbox.distributions.transformed.negsquarenormalg
- statsmodels.sandbox.distributions.transformed.squarenormalg
- statsmodels.sandbox.distributions.transformed.squaretg
- Graphics
- Goodness of Fit Plots
- Boxplots
- Correlation Plots
- Functional Plots
- Regression Plots
- statsmodels.graphics.regressionplots.plot_fit
- statsmodels.graphics.regressionplots.plot_regress_exog
- statsmodels.graphics.regressionplots.plot_partregress
- statsmodels.graphics.regressionplots.plot_ccpr
- statsmodels.graphics.regressionplots.abline_plot
- statsmodels.graphics.regressionplots.influence_plot
- statsmodels.graphics.regressionplots.plot_leverage_resid2
- Time Series Plots
- Other Plots
- Input-Output
iolib
- Examples
- Module Reference
- statsmodels.iolib.foreign.StataReader
- statsmodels.iolib.foreign.StataWriter
- statsmodels.iolib.foreign.genfromdta
- statsmodels.iolib.foreign.savetxt
- statsmodels.iolib.table.SimpleTable
- statsmodels.iolib.table.csv2st
- statsmodels.iolib.smpickle.save_pickle
- statsmodels.iolib.smpickle.load_pickle
- statsmodels.iolib.summary.Summary
- statsmodels.iolib.summary2.Summary
- Tools
- The Datasets Package
- Using Datasets from Stata
- Using Datasets from R
- R Datasets Function Reference
- Available Datasets
- American National Election Survey 1996
- Breast Cancer Data
- Bill Greene’s credit scoring data.
- Smoking and lung cancer in eight cities in China.
- Mauna Loa Weekly Atmospheric CO2 Data
- First 100 days of the US House of Representatives 1995
- World Copper Market 1951-1975 Dataset
- US Capital Punishment dataset.
- El Nino - Sea Surface Temperatures
- Engel (1857) food expenditure data
- Affairs dataset
- World Bank Fertility Data
- Grunfeld (1950) Investment Data
- Transplant Survival Data
- Longley dataset
- United States Macroeconomic data
- Travel Mode Choice
- Nile River flows at Ashwan 1871-1970
- RAND Health Insurance Experiment Data
- Taxation Powers Vote for the Scottish Parliamant 1997
- Spector and Mazzeo (1980) - Program Effectiveness Data
- Stack loss data
- Star98 Educational Dataset
- Statewide Crime Data 2009
- U.S. Strike Duration Data
- Yearly sunspots data 1700-2008
- Usage
- Additional information
- Sandbox