Generalized Linear Models¶
Generalized linear models currently supports estimation using the one-parameter exponential families.
See Module Reference for commands and arguments.
Examples¶
# Load modules and data
In [1]: import statsmodels.api as sm
In [2]: data = sm.datasets.scotland.load()
In [3]: data.exog = sm.add_constant(data.exog)
# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
In [5]: gamma_results = gamma_model.fit()
In [6]: print(gamma_results.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: YES No. Observations: 32
Model: GLM Df Residuals: 24
Model Family: Gamma Df Model: 7
Link Function: InversePower Scale: 0.0035843
Method: IRLS Log-Likelihood: -83.017
Date: Thu, 27 Mar 2025 Deviance: 0.087389
Time: 12:05:02 Pearson chi2: 0.0860
No. Iterations: 6 Pseudo R-squ. (CS): 0.9800
Covariance Type: nonrobust
======================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------
const -0.0178 0.011 -1.548 0.122 -0.040 0.005
COUTAX 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05
UNEMPF 0.0020 0.001 3.824 0.000 0.001 0.003
MOR -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05
ACT 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000
GDP -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08
AGE -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05
COUTAX_FEMALEUNEMP -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07
======================================================================================
Detailed examples can be found here:
Technical Documentation¶
The statistical model for each observation
and .
where
It follows that
with
Note that while
Distribution |
Domain |
|||||
---|---|---|---|---|---|---|
Binomial |
1 |
|||||
Poisson |
1 |
|||||
Neg. Binom. |
1 |
|||||
Gaussian/Normal |
||||||
Gamma |
||||||
Inv. Gauss. |
||||||
Tweedie |
depends on |
The Tweedie distribution has special cases for
Correspondence of mathematical variables to code:
and are coded asendog
, the variable one wants to model is coded asexog
, the covariates alias explanatory variables is coded asparams
, the parameters one wants to estimate is coded asmu
, the expectation (conditional on ) of is coded aslink
argument to theclass Family
is coded asscale
, the dispersion parameter of the EDM is not yet supported (i.e. ), in the future it might bevar_weights
is coded asvar_power
for the power of the variance function of the Tweedie distribution, see table is eitherNegative Binomial: the ancillary parameter
alpha
, see tableTweedie: an abbreviation for
of the power of the variance function, see table
References¶
Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.
Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.
Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.
McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.
Module Reference¶
Model Class¶
|
Generalized Linear Models |
Results Class¶
|
Class to contain GLM results. |
|
Prediction results for GLM. |
Families¶
The distribution families currently implemented are
|
The parent class for one-parameter exponential families. |
|
Binomial exponential family distribution. |
|
Gamma exponential family distribution. |
|
Gaussian exponential family distribution. |
|
InverseGaussian exponential family. |
|
Negative Binomial exponential family (corresponds to NB2). |
|
Poisson exponential family. |
|
Tweedie family. |
Link Functions¶
Note: The lower case link classes have been deprecated and will be removed in future. Link classes now follow the Python class name convention.
The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by
>>> sm.families.family.<familyname>.links
|
A generic link function for one-parameter exponential family. |
|
The use the CDF of a scipy.stats distribution |
|
The complementary log-log transform |
|
The log-log transform |
|
The log-complement transform |
|
The log transform |
|
The logit transform |
|
The negative binomial link function |
|
The power transform |
|
The Cauchy (standard Cauchy CDF) transform |
|
The identity transform |
The inverse transform |
|
The inverse squared transform |
|
|
The probit (standard normal CDF) transform |
Variance Functions¶
Each of the families has an associated variance function. You can access the variance functions here:
>>> sm.families.<familyname>.variance
Relates the variance of a random variable to its mean. |
|
The call method of constant returns a constant variance, i.e., a vector of ones. |
|
|
Power variance function |
Returns np.fabs(mu) |
|
Returns np.fabs(mu)**2 |
|
Returns np.fabs(mu)**3 |
|
|
Binomial variance function |
The binomial variance function for n = 1 |
|
|
Negative binomial variance function |
Negative Binomial variance function. |