Generalized Linear Models

Generalized linear models currently supports estimation using the one-parameter exponential families.

See Module Reference for commands and arguments.

Examples

# Load modules and data
In [1]: import statsmodels.api as sm

In [2]: data = sm.datasets.scotland.load(as_pandas=False)

In [3]: data.exog = sm.add_constant(data.exog)

# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())

In [5]: gamma_results = gamma_model.fit()

In [6]: print(gamma_results.summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                      y   No. Observations:                   32
Model:                            GLM   Df Residuals:                       24
Model Family:                   Gamma   Df Model:                            7
Link Function:          inverse_power   Scale:                       0.0035843
Method:                          IRLS   Log-Likelihood:                -83.017
Date:                Tue, 02 Feb 2021   Deviance:                     0.087389
Time:                        07:07:06   Pearson chi2:                   0.0860
No. Iterations:                     6                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0178      0.011     -1.548      0.122      -0.040       0.005
x1          4.962e-05   1.62e-05      3.060      0.002    1.78e-05    8.14e-05
x2             0.0020      0.001      3.824      0.000       0.001       0.003
x3         -7.181e-05   2.71e-05     -2.648      0.008      -0.000   -1.87e-05
x4             0.0001   4.06e-05      2.757      0.006    3.23e-05       0.000
x5         -1.468e-07   1.24e-07     -1.187      0.235   -3.89e-07    9.56e-08
x6            -0.0005      0.000     -2.159      0.031      -0.001   -4.78e-05
x7         -2.427e-06   7.46e-07     -3.253      0.001   -3.89e-06   -9.65e-07
==============================================================================

Detailed examples can be found here:

Technical Documentation

The statistical model for each observation i is assumed to be

YiFEDM(|θ,ϕ,wi) and μi=E[Yi|xi]=g1(xiβ).

where g is the link function and FEDM(|θ,ϕ,w) is a distribution of the family of exponential dispersion models (EDM) with natural parameter θ, scale parameter ϕ and weight w. Its density is given by

fEDM(y|θ,ϕ,w)=c(y,ϕ,w)exp(yθb(θ)ϕw).

It follows that μ=b(θ) and Var[Y|x]=ϕwb. The inverse of the first equation gives the natural parameter as a function of the expected value \theta(\mu) such that

Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)

with v(\mu) = b''(\theta(\mu)). Therefore it is said that a GLM is determined by link function g and variance function v(\mu) alone (and x of course).

Note that while \phi is the same for every observation y_i and therefore does not influence the estimation of \beta, the weights w_i might be different for every y_i such that the estimation of \beta depends on them.

Distribution

Domain

\mu=E[Y|x]

v(\mu)

\theta(\mu)

b(\theta)

\phi

Binomial B(n,p)

0,1,\ldots,n

np

\mu-\frac{\mu^2}{n}

\log\frac{p}{1-p}

n\log(1+e^\theta)

1

Poisson P(\mu)

0,1,\ldots,\infty

\mu

\mu

\log(\mu)

e^\theta

1

Neg. Binom. NB(\mu,\alpha)

0,1,\ldots,\infty

\mu

\mu+\alpha\mu^2

\log(\frac{\alpha\mu}{1+\alpha\mu})

-\frac{1}{\alpha}\log(1-\alpha e^\theta)

1

Gaussian/Normal N(\mu,\sigma^2)

(-\infty,\infty)

\mu

1

\mu

\frac{1}{2}\theta^2

\sigma^2

Gamma N(\mu,\nu)

(0,\infty)

\mu

\mu^2

-\frac{1}{\mu}

-\log(-\theta)

\frac{1}{\nu}

Inv. Gauss. IG(\mu,\sigma^2)

(0,\infty)

\mu

\mu^3

-\frac{1}{2\mu^2}

-\sqrt{-2\theta}

\sigma^2

Tweedie p\geq 1

depends on p

\mu

\mu^p

\frac{\mu^{1-p}}{1-p}

\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha}

\phi

The Tweedie distribution has special cases for p=0,1,2 not listed in the table and uses \alpha=\frac{p-2}{p-1}.

Correspondence of mathematical variables to code:

  • Y and y are coded as endog, the variable one wants to model

  • x is coded as exog, the covariates alias explanatory variables

  • \beta is coded as params, the parameters one wants to estimate

  • \mu is coded as mu, the expectation (conditional on x) of Y

  • g is coded as link argument to the class Family

  • \phi is coded as scale, the dispersion parameter of the EDM

  • w is not yet supported (i.e. w=1), in the future it might be var_weights

  • p is coded as var_power for the power of the variance function v(\mu) of the Tweedie distribution, see table

  • \alpha is either

    • Negative Binomial: the ancillary parameter alpha, see table

    • Tweedie: an abbreviation for \frac{p-2}{p-1} of the power p of the variance function, see table

References

  • Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.

  • Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.

  • Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.

  • McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.

Module Reference

Model Class

GLM(endog, exog[, family, offset, exposure, …])

Generalized Linear Models

Results Class

GLMResults(model, params, …[, cov_type, …])

Class to contain GLM results.

PredictionResults(predicted_mean, var_pred_mean)

Attributes

Families

The distribution families currently implemented are

Family(link, variance)

The parent class for one-parameter exponential families.

Binomial([link])

Binomial exponential family distribution.

Gamma([link])

Gamma exponential family distribution.

Gaussian([link])

Gaussian exponential family distribution.

InverseGaussian([link])

InverseGaussian exponential family.

NegativeBinomial([link, alpha])

Negative Binomial exponential family.

Poisson([link])

Poisson exponential family.

Tweedie([link, var_power, eql])

Tweedie family.

The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by

>>> sm.families.family.<familyname>.links

Link()

A generic link function for one-parameter exponential family.

CDFLink([dbn])

The use the CDF of a scipy.stats distribution

CLogLog()

The complementary log-log transform

Log()

The log transform

Logit()

The logit transform

NegativeBinomial([alpha])

The negative binomial link function

Power([power])

The power transform

cauchy()

The Cauchy (standard Cauchy CDF) transform

cloglog()

The CLogLog transform link function.

identity()

The identity transform

inverse_power()

The inverse transform

inverse_squared()

The inverse squared transform

log()

The log transform

logit()

Methods

nbinom([alpha])

The negative binomial link function.

probit([dbn])

The probit (standard normal CDF) transform

Variance Functions

Each of the families has an associated variance function. You can access the variance functions here:

>>> sm.families.<familyname>.variance

VarianceFunction()

Relates the variance of a random variable to its mean.

constant

The call method of constant returns a constant variance, i.e., a vector of ones.

Power([power])

Power variance function

mu

Returns np.fabs(mu)

mu_squared

Returns np.fabs(mu)**2

mu_cubed

Returns np.fabs(mu)**3

Binomial([n])

Binomial variance function

binary

The binomial variance function for n = 1

NegativeBinomial([alpha])

Negative binomial variance function

nbinom

Negative Binomial variance function.