Welcome to Statsmodels’s Documentation¶
statsmodels
is a Python module that provides classes and functions for the estimation
of many different statistical models, as well as for conducting statistical tests, and statistical
data exploration. An extensive list of result statistics are avalable for each estimator.
The results are tested against existing statistical packages to ensure that they are correct. The
package is released under the open source Modified BSD (3-clause) license.
The online documentation is hosted at sourceforge.
Minimal Examples¶
Since version 0.5.0
of statsmodels
, you can use R-style formulas
together with pandas
data frames to fit your models. Here is a simple
example using ordinary least squares:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Load data
dat = sm.datasets.get_rdataset("Guerry", "HistData").data
# Fit regression model (using the natural log of one of the regressors)
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
# Inspect the results
print(results.summary())
You can also use numpy
arrays instead of formulas:
import numpy as np
import statsmodels.api as sm
# Generate artificial data (2 regressors + constant)
nobs = 100
X = np.random.random((nobs, 2))
X = sm.add_constant(X)
beta = [1, .1, .5]
e = np.random.random(nobs)
y = np.dot(X, beta) + e
# Fit regression model
results = sm.OLS(y, X).fit()
# Inspect the results
print(results.summary())
Have a look at dir(results) to see available results. Attributes are described in results.__doc__ and results methods have their own docstrings.
Basic Documentation¶
Information about the structure and development of statsmodels:
Table of Contents¶
- Linear Regression
- Generalized Linear Models
- Generalized Estimating Equations
- Robust Linear Models
- Linear Mixed Effects Models
- Regression with Discrete Dependent Variable
- Examples
- Technical Documentation
- Module Reference
- statsmodels.discrete.discrete_model.Logit
- statsmodels.discrete.discrete_model.Probit
- statsmodels.discrete.discrete_model.MNLogit
- statsmodels.discrete.discrete_model.Poisson
- statsmodels.discrete.discrete_model.NegativeBinomial
- statsmodels.discrete.discrete_model.LogitResults
- statsmodels.discrete.discrete_model.ProbitResults
- statsmodels.discrete.discrete_model.CountResults
- statsmodels.discrete.discrete_model.MultinomialResults
- statsmodels.discrete.discrete_model.NegativeBinomialResults
- statsmodels.discrete.discrete_model.DiscreteModel
- statsmodels.discrete.discrete_model.DiscreteResults
- statsmodels.discrete.discrete_model.BinaryModel
- statsmodels.discrete.discrete_model.BinaryResults
- statsmodels.discrete.discrete_model.CountModel
- statsmodels.discrete.discrete_model.MultinomialModel
- ANOVA
- Time Series analysis
tsa
- Descriptive Statistics and Tests
- statsmodels.tsa.stattools.acovf
- statsmodels.tsa.stattools.acf
- statsmodels.tsa.stattools.pacf
- statsmodels.tsa.stattools.pacf_yw
- statsmodels.tsa.stattools.pacf_ols
- statsmodels.tsa.stattools.ccovf
- statsmodels.tsa.stattools.ccf
- statsmodels.tsa.stattools.periodogram
- statsmodels.tsa.stattools.adfuller
- statsmodels.tsa.stattools.q_stat
- statsmodels.tsa.stattools.grangercausalitytests
- statsmodels.tsa.stattools.levinson_durbin
- statsmodels.tsa.stattools.arma_order_select_ic
- statsmodels.tsa.x13.x13_arima_select_order
- statsmodels.tsa.x13.x13_arima_analysis
- Estimation
- Vector Autogressive Processes (VAR)
- ARMA Process
- statsmodels.tsa.arima_process.ArmaProcess
- statsmodels.tsa.arima_process.ar2arma
- statsmodels.tsa.arima_process.arma2ar
- statsmodels.tsa.arima_process.arma2ma
- statsmodels.tsa.arima_process.arma_acf
- statsmodels.tsa.arima_process.arma_acovf
- statsmodels.tsa.arima_process.arma_generate_sample
- statsmodels.tsa.arima_process.arma_impulse_response
- statsmodels.tsa.arima_process.arma_pacf
- statsmodels.tsa.arima_process.arma_periodogram
- statsmodels.tsa.arima_process.deconvolve
- statsmodels.tsa.arima_process.index2lpol
- statsmodels.tsa.arima_process.lpol2index
- statsmodels.tsa.arima_process.lpol_fiar
- statsmodels.tsa.arima_process.lpol_fima
- statsmodels.tsa.arima_process.lpol_sdiff
- statsmodels.sandbox.tsa.fftarma.ArmaFft
- Time Series Filters
- statsmodels.tsa.filters.bk_filter.bkfilter
- statsmodels.tsa.filters.hp_filter.hpfilter
- statsmodels.tsa.filters.cf_filter.cffilter
- statsmodels.tsa.filters.filtertools.convolution_filter
- statsmodels.tsa.filters.filtertools.recursive_filter
- statsmodels.tsa.filters.filtertools.miso_lfilter
- statsmodels.tsa.filters.filtertools.fftconvolve3
- statsmodels.tsa.filters.filtertools.fftconvolveinv
- TSA Tools
- VARMA Process
- Interpolation
- Descriptive Statistics and Tests
- Models for Survival and Duration Analysis
- Statistics
stats
- Residual Diagnostics and Specification Tests
- Sandwich Robust Covariances
- statsmodels.stats.sandwich_covariance.cov_hac
- statsmodels.stats.sandwich_covariance.cov_nw_panel
- statsmodels.stats.sandwich_covariance.cov_nw_groupsum
- statsmodels.stats.sandwich_covariance.cov_cluster
- statsmodels.stats.sandwich_covariance.cov_cluster_2groups
- statsmodels.stats.sandwich_covariance.cov_white_simple
- statsmodels.stats.sandwich_covariance.cov_hc0
- statsmodels.stats.sandwich_covariance.cov_hc1
- statsmodels.stats.sandwich_covariance.cov_hc2
- statsmodels.stats.sandwich_covariance.cov_hc3
- Goodness of Fit Tests and Measures
- Non-Parametric Tests
- statsmodels.sandbox.stats.runs.mcnemar
- statsmodels.sandbox.stats.runs.symmetry_bowker
- statsmodels.sandbox.stats.runs.median_test_ksample
- statsmodels.sandbox.stats.runs.runstest_1samp
- statsmodels.sandbox.stats.runs.runstest_2samp
- statsmodels.sandbox.stats.runs.cochrans_q
- statsmodels.sandbox.stats.runs.Runs
- statsmodels.stats.descriptivestats.sign_test
- Interrater Reliability and Agreement
- Multiple Tests and Multiple Comparison Procedures
- statsmodels.sandbox.stats.multicomp.GroupsStats
- statsmodels.sandbox.stats.multicomp.MultiComparison
- statsmodels.sandbox.stats.multicomp.TukeyHSDResults
- statsmodels.stats.multicomp.pairwise_tukeyhsd
- statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unbalanced
- statsmodels.sandbox.stats.multicomp.varcorrection_pairs_unequal
- statsmodels.sandbox.stats.multicomp.varcorrection_unbalanced
- statsmodels.sandbox.stats.multicomp.varcorrection_unequal
- statsmodels.sandbox.stats.multicomp.StepDown
- statsmodels.sandbox.stats.multicomp.catstack
- statsmodels.sandbox.stats.multicomp.ccols
- statsmodels.sandbox.stats.multicomp.compare_ordered
- statsmodels.sandbox.stats.multicomp.distance_st_range
- statsmodels.sandbox.stats.multicomp.get_tukeyQcrit
- statsmodels.sandbox.stats.multicomp.homogeneous_subsets
- statsmodels.sandbox.stats.multicomp.maxzero
- statsmodels.sandbox.stats.multicomp.maxzerodown
- statsmodels.sandbox.stats.multicomp.mcfdr
- statsmodels.sandbox.stats.multicomp.qcrit
- statsmodels.sandbox.stats.multicomp.randmvn
- statsmodels.sandbox.stats.multicomp.rankdata
- statsmodels.sandbox.stats.multicomp.rejectionline
- statsmodels.sandbox.stats.multicomp.set_partition
- statsmodels.sandbox.stats.multicomp.set_remove_subs
- statsmodels.sandbox.stats.multicomp.tiecorrect
- Basic Statistics and t-Tests with frequency weights
- statsmodels.stats.weightstats.DescrStatsW
- statsmodels.stats.weightstats.CompareMeans
- statsmodels.stats.weightstats.ttest_ind
- statsmodels.stats.weightstats.ttost_ind
- statsmodels.stats.weightstats.ttost_paired
- statsmodels.stats.weightstats.ztest
- statsmodels.stats.weightstats.ztost
- statsmodels.stats.weightstats.zconfint
- statsmodels.stats.weightstats._tconfint_generic
- statsmodels.stats.weightstats._tstat_generic
- statsmodels.stats.weightstats._zconfint_generic
- statsmodels.stats.weightstats._zstat_generic
- statsmodels.stats.weightstats._zstat_generic2
- Power and Sample Size Calculations
- statsmodels.stats.power.TTestIndPower
- statsmodels.stats.power.TTestPower
- statsmodels.stats.power.GofChisquarePower
- statsmodels.stats.power.NormalIndPower
- statsmodels.stats.power.FTestAnovaPower
- statsmodels.stats.power.FTestPower
- statsmodels.stats.power.tt_solve_power
- statsmodels.stats.power.tt_ind_solve_power
- statsmodels.stats.power.zt_ind_solve_power
- Proportion
- statsmodels.stats.proportion.proportion_confint
- statsmodels.stats.proportion.proportion_effectsize
- statsmodels.stats.proportion.binom_test
- statsmodels.stats.proportion.binom_test_reject_interval
- statsmodels.stats.proportion.binom_tost
- statsmodels.stats.proportion.binom_tost_reject_interval
- statsmodels.stats.proportion.proportions_ztest
- statsmodels.stats.proportion.proportions_ztost
- statsmodels.stats.proportion.proportions_chisquare
- statsmodels.stats.proportion.proportions_chisquare_allpairs
- statsmodels.stats.proportion.proportions_chisquare_pairscontrol
- statsmodels.stats.proportion.proportion_effectsize
- statsmodels.stats.proportion.power_binom_tost
- statsmodels.stats.proportion.power_ztost_prop
- statsmodels.stats.proportion.samplesize_confint_proportion
- Moment Helpers
- statsmodels.stats.correlation_tools.corr_nearest
- statsmodels.stats.correlation_tools.corr_clipped
- statsmodels.stats.correlation_tools.cov_nearest
- statsmodels.stats.moment_helpers.cum2mc
- statsmodels.stats.moment_helpers.mc2mnc
- statsmodels.stats.moment_helpers.mc2mvsk
- statsmodels.stats.moment_helpers.mnc2cum
- statsmodels.stats.moment_helpers.mnc2mc
- statsmodels.stats.moment_helpers.mnc2mvsk
- statsmodels.stats.moment_helpers.mvsk2mc
- statsmodels.stats.moment_helpers.mvsk2mnc
- statsmodels.stats.moment_helpers.cov2corr
- statsmodels.stats.moment_helpers.corr2cov
- statsmodels.stats.moment_helpers.se_cov
- Nonparametric Methods
nonparametric
- Kernel density estimation
- Kernel regression
- References
- Module Reference
- statsmodels.nonparametric.smoothers_lowess.lowess
- statsmodels.nonparametric.kde.KDEUnivariate
- statsmodels.nonparametric.kernel_density.KDEMultivariate
- statsmodels.nonparametric.kernel_density.KDEMultivariateConditional
- statsmodels.nonparametric.kernel_regression.KernelReg
- statsmodels.nonparametric.kernel_regression.KernelCensoredReg
- statsmodels.nonparametric.bandwidths.bw_scott
- statsmodels.nonparametric.bandwidths.bw_silverman
- statsmodels.nonparametric.bandwidths.select_bandwidth
- Generalized Method of Moments
gmm
- Module Reference
- statsmodels.sandbox.regression.gmm.GMM
- statsmodels.sandbox.regression.gmm.GMMResults
- statsmodels.sandbox.regression.gmm.IV2SLS
- statsmodels.sandbox.regression.gmm.IVGMM
- statsmodels.sandbox.regression.gmm.IVGMMResults
- statsmodels.sandbox.regression.gmm.IVRegressionResults
- statsmodels.sandbox.regression.gmm.LinearIVGMM
- statsmodels.sandbox.regression.gmm.NonlinearIVGMM
- Module Reference
- Empirical Likelihood
emplike
- Other Models
miscmodels
- Distributions
- Empirical Distributions
- Distribution Extras
- statsmodels.sandbox.distributions.extras.SkewNorm_gen
- statsmodels.sandbox.distributions.extras.SkewNorm2_gen
- statsmodels.sandbox.distributions.extras.ACSkewT_gen
- statsmodels.sandbox.distributions.extras.skewnorm2
- statsmodels.sandbox.distributions.extras.pdf_moments_st
- statsmodels.sandbox.distributions.extras.pdf_mvsk
- statsmodels.sandbox.distributions.extras.pdf_moments
- statsmodels.sandbox.distributions.extras.NormExpan_gen
- statsmodels.sandbox.distributions.extras.mvstdnormcdf
- statsmodels.sandbox.distributions.extras.mvnormcdf
- Univariate Distributions by non-linear Transformations
- statsmodels.sandbox.distributions.transformed.TransfTwo_gen
- statsmodels.sandbox.distributions.transformed.Transf_gen
- statsmodels.sandbox.distributions.transformed.ExpTransf_gen
- statsmodels.sandbox.distributions.transformed.LogTransf_gen
- statsmodels.sandbox.distributions.transformed.SquareFunc
- statsmodels.sandbox.distributions.transformed.absnormalg
- statsmodels.sandbox.distributions.transformed.invdnormalg
- statsmodels.sandbox.distributions.transformed.loggammaexpg
- statsmodels.sandbox.distributions.transformed.lognormalg
- statsmodels.sandbox.distributions.transformed.negsquarenormalg
- statsmodels.sandbox.distributions.transformed.squarenormalg
- statsmodels.sandbox.distributions.transformed.squaretg
- Graphics
- Goodness of Fit Plots
- Boxplots
- Correlation Plots
- Functional Plots
- Regression Plots
- statsmodels.graphics.regressionplots.plot_fit
- statsmodels.graphics.regressionplots.plot_regress_exog
- statsmodels.graphics.regressionplots.plot_partregress
- statsmodels.graphics.regressionplots.plot_ccpr
- statsmodels.graphics.regressionplots.abline_plot
- statsmodels.graphics.regressionplots.influence_plot
- statsmodels.graphics.regressionplots.plot_leverage_resid2
- Time Series Plots
- Other Plots
- Input-Output
iolib
- Examples
- Module Reference
- statsmodels.iolib.foreign.StataReader
- statsmodels.iolib.foreign.StataWriter
- statsmodels.iolib.foreign.genfromdta
- statsmodels.iolib.foreign.savetxt
- statsmodels.iolib.table.SimpleTable
- statsmodels.iolib.table.csv2st
- statsmodels.iolib.smpickle.save_pickle
- statsmodels.iolib.smpickle.load_pickle
- statsmodels.iolib.summary.Summary
- statsmodels.iolib.summary2.Summary
- Tools
- The Datasets Package
- Using Datasets from Stata
- Using Datasets from R
- R Datasets Function Reference
- Available Datasets
- American National Election Survey 1996
- Breast Cancer Data
- Bill Greene’s credit scoring data.
- Mauna Loa Weekly Atmospheric CO2 Data
- First 100 days of the US House of Representatives 1995
- World Copper Market 1951-1975 Dataset
- US Capital Punishment dataset.
- El Nino - Sea Surface Temperatures
- Engel (1857) food expenditure data
- Affairs dataset
- Grunfeld (1950) Investment Data
- Transplant Survival Data
- Longley dataset
- United States Macroeconomic data
- Travel Mode Choice
- Nile River flows at Ashwan 1871-1970
- RAND Health Insurance Experiment Data
- Taxation Powers Vote for the Scottish Parliamant 1997
- Spector and Mazzeo (1980) - Program Effectiveness Data
- Stack loss data
- Star98 Educational Dataset
- Statewide Crime Data 2009
- U.S. Strike Duration Data
- Yearly sunspots data 1700-2008
- Usage
- Additional information
- Using Datasets from Stata
- Sandbox