Generalized Linear Models

Generalized linear models currently supports estimation using the one-parameter exponential families.

See Module Reference for commands and arguments.

Examples

# Load modules and data
In [1]: import statsmodels.api as sm

In [2]: data = sm.datasets.scotland.load()

In [3]: data.exog = sm.add_constant(data.exog)

# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

In [5]: gamma_results = gamma_model.fit()

In [6]: print(gamma_results.summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                    YES   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                   Gamma   Df Model:                            7
Link Function:          inverse_power   Scale:                       0.0035843
Method:                          IRLS   Log-Likelihood:                -83.017
Date:                Wed, 02 Nov 2022   Deviance:                     0.087389
Time:                        17:12:43   Pearson chi2:                   0.0860
No. Iterations:                     6   Pseudo R-squ. (CS):             0.9800
Covariance Type:            nonrobust                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                 -0.0178      0.011     -1.548      0.122      -0.040       0.005
COUTAX              4.962e-05   1.62e-05      3.060      0.002    1.78e-05    8.14e-05
UNEMPF                 0.0020      0.001      3.824      0.000       0.001       0.003
MOR                -7.181e-05   2.71e-05     -2.648      0.008      -0.000   -1.87e-05
ACT                    0.0001   4.06e-05      2.757      0.006    3.23e-05       0.000
GDP                -1.468e-07   1.24e-07     -1.187      0.235   -3.89e-07    9.56e-08
AGE                   -0.0005      0.000     -2.159      0.031      -0.001   -4.78e-05
COUTAX_FEMALEUNEMP -2.427e-06   7.46e-07     -3.253      0.001   -3.89e-06   -9.65e-07
======================================================================================

Detailed examples can be found here:

Technical Documentation

The statistical model for each observation i is assumed to be

YiFEDM(|θ,ϕ,wi) and μi=E[Yi|xi]=g1(xiβ).

where g is the link function and FEDM(|θ,ϕ,w) is a distribution of the family of exponential dispersion models (EDM) with natural parameter θ, scale parameter ϕ and weight w. Its density is given by

fEDM(y|θ,ϕ,w)=c(y,ϕ,w)exp(yθb(θ)ϕw).

It follows that μ=b(θ) and Var[Y|x]=ϕwb(θ). The inverse of the first equation gives the natural parameter as a function of the expected value θ(μ) such that

Var[Yi|xi]=ϕwiv(μi)

with v(μ)=b(θ(μ)). Therefore it is said that a GLM is determined by link function g and variance function v(μ) alone (and x of course).

Note that while ϕ is the same for every observation yi and therefore does not influence the estimation of β, the weights wi might be different for every yi such that the estimation of β depends on them.

Distribution

Domain

μ=E[Y|x]

v(μ)

θ(μ)

b(θ)

ϕ

Binomial B(n,p)

0,1,,n

np

μμ2n

logp1p

nlog(1+eθ)

1

Poisson P(μ)

0,1,,

μ

μ

log(μ)

eθ

1

Neg. Binom. NB(μ,α)

0,1,,

μ

μ+αμ2

log(αμ1+αμ)

1αlog(1αeθ)

1

Gaussian/Normal N(μ,σ2)

(,)

μ

1

μ

12θ2

σ2

Gamma N(μ,ν)

(0,)

μ

μ2

1μ

log(θ)

1ν

Inv. Gauss. IG(μ,σ2)

(0,)

μ

μ3

12μ2

2θ

σ2

Tweedie p1

depends on p

μ

μp

μ1p1p

α1α(θα1)α

ϕ

The Tweedie distribution has special cases for p=0,1,2 not listed in the table and uses α=p2p1.

Correspondence of mathematical variables to code:

  • Y and y are coded as endog, the variable one wants to model

  • x is coded as exog, the covariates alias explanatory variables

  • β is coded as params, the parameters one wants to estimate

  • μ is coded as mu, the expectation (conditional on x) of Y

  • g is coded as link argument to the class Family

  • ϕ is coded as scale, the dispersion parameter of the EDM

  • w is not yet supported (i.e. w=1), in the future it might be var_weights

  • p is coded as var_power for the power of the variance function v(μ) of the Tweedie distribution, see table

  • α is either

    • Negative Binomial: the ancillary parameter alpha, see table

    • Tweedie: an abbreviation for p2p1 of the power p of the variance function, see table

References

  • Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.

  • Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.

  • Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.

  • McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.

Module Reference

Model Class

GLM(endog, exog[, family, offset, exposure, ...])

Generalized Linear Models

Results Class

GLMResults(model, params, ...[, cov_type, ...])

Class to contain GLM results.

PredictionResults(predicted_mean, var_pred_mean)

Attributes:

Families

The distribution families currently implemented are

Family(link, variance)

The parent class for one-parameter exponential families.

Binomial([link])

Binomial exponential family distribution.

Gamma([link])

Gamma exponential family distribution.

Gaussian([link])

Gaussian exponential family distribution.

InverseGaussian([link])

InverseGaussian exponential family.

NegativeBinomial([link, alpha])

Negative Binomial exponential family (corresponds to NB2).

Poisson([link])

Poisson exponential family.

Tweedie([link, var_power, eql])

Tweedie family.

Variance Functions

Each of the families has an associated variance function. You can access the variance functions here:

>>> sm.families.<familyname>.variance

VarianceFunction()

Relates the variance of a random variable to its mean.

constant

The call method of constant returns a constant variance, i.e., a vector of ones.

Power([power])

Power variance function

mu

Returns np.fabs(mu)

mu_squared

Returns np.fabs(mu)**2

mu_cubed

Returns np.fabs(mu)**3

Binomial([n])

Binomial variance function

binary

The binomial variance function for n = 1

NegativeBinomial([alpha])

Negative binomial variance function

nbinom

Negative Binomial variance function.