statsmodels.genmod.generalized_linear_model.GLM¶

class statsmodels.genmod.generalized_linear_model.GLM(endog, exog, family=None, offset=None, exposure=None, freq_weights=None, missing='none', **kwargs)[source]¶

Generalized Linear Models class

GLM inherits from statsmodels.base.model.LikelihoodModel

Parameters:

endog : array-like

1d array of endogenous response variable. This array can be 1d or 2d. Binomial family models accept a 2d array with two columns. If supplied, each observation is expected to be [success, failure].

exog : array-like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user (models specified using a formula include an intercept by default). See statsmodels.tools.add_constant.

family : family class instance

The default is Gaussian. To specify the binomial distribution family = sm.family.Binomial() Each family can take a link instance as an argument. See statsmodels.family.family for more information.

offset : array-like or None

An offset to be included in the model. If provided, must be an array whose length is the number of rows in exog.

exposure : array-like or None

Log(exposure) will be added to the linear prediction in the model. Exposure is only valid if the log link is used. If provided, it must be an array with the same length as endog.

freq_weights : array-like

1d array of frequency weights. The default is None. If None is selected or a blank value, then the algorithm will replace with an array of 1’s with length equal to the endog. WARNING: Using weights is not verified yet for all possible options and results, see Notes.

missing : str

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none.’

Notes

Only the following combinations make sense for family and link

             + ident log logit probit cloglog pow opow nbinom loglog logc
Gaussian     |   x    x                        x
inv Gaussian |   x    x                        x
binomial     |   x    x    x     x       x     x    x           x      x
Poission     |   x    x                        x
neg binomial |   x    x                        x          x
gamma        |   x    x                        x

Not all of these link functions are currently available.

Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.

Using frequency weights: Frequency weights produce the same results as repeating observations by the frequencies (if those are integers). This is verified for all basic results with nonrobust or heteroscedasticity robust cov_type. Other robust covariance types have not yet been verified, and at least the small sample correction is currently not based on the correct total frequency count. It has not yet been decided whether all the different types of residuals will be based on weighted residuals. Currently, residuals are not weighted.

Attributes

df_model : float: Model degrees of freedom is equal to p - 1, where p is the number of regressors. Note that the intercept is not reported as a degree of freedom.
df_resid : float: Residual degrees of freedom is equal to the number of observation n minus the number of regressors p.
endog : array: See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.
exposure : array-like: Include ln(exposure) in model with coefficient constrained to 1. Can only be used if the link is the logarithm function.
exog : array: See above. Note that exog is a reference to the data so that if data is already an array and it is changed, then exog changes as well.
freq_weights : array: See above. Note that freq_weights is a reference to the data so that if data i already an array and it is changed, then freq_weights changes as well.
iteration : int: The number of iterations that fit has run. Initialized at 0.
family : family class instance: The distribution family of the model. Can be any family in statsmodels.families. Default is Gaussian.
mu : array: The mean response of the transformed variable. mu is the value of the inverse of the link function at lin_pred, where lin_pred is the linear predicted value of the WLS fit of the transformed variable. mu is only available after fit is called. See statsmodels.families.family.fitted of the distribution family for more information.
n_trials : array: See above. Note that n_trials is a reference to the data so that if data is already an array and it is changed, then n_trials changes as well. n_trials is the number of binomial trials and only available with that distribution. See statsmodels.families.Binomial for more information.
normalized_cov_params : array: The p x p normalized covariance of the design / exogenous data. This is approximately equal to (X.T X)^(-1)
offset : array-like: Include offset in model with coefficient constrained to 1.
pinv_wexog : array: The pseudoinverse of the design / exogenous data array. Note that GLM has no whiten method, so this is just the pseudo inverse of the design. The pseudoinverse is approximately equal to (X.T X)^(-1)X.T
scale : float: The estimate of the scale / dispersion of the model fit. Only available after fit is called. See GLM.fit and GLM.estimate_scale for more information.
scaletype : str: The scaling used for fitting the model. This is only available after fit is called. The default is None. See GLM.fit for more information.
weights : array: The value of the weights after the last iteration of fit. Only available after fit is called. See statsmodels.families.family for the specific distribution weighting functions.

Examples

>>> import statsmodels.api as sm
>>> data = sm.datasets.scotland.load()
>>> data.exog = sm.add_constant(data.exog)

Instantiate a gamma family model with the default link function.

>>> gamma_model = sm.GLM(data.endog, data.exog,
...                      family=sm.families.Gamma())

>>> gamma_results = gamma_model.fit()
>>> gamma_results.params
array([-0.01776527,  0.00004962,  0.00203442, -0.00007181,  0.00011185,
       -0.00000015, -0.00051868, -0.00000243])
>>> gamma_results.scale
0.0035842831734919055
>>> gamma_results.deviance
0.087388516416999198
>>> gamma_results.pearson_chi2
0.086022796163805704
>>> gamma_results.llf
-83.017202161073527

Attributes

df_model	(float) p - 1, where p is the number of regressors including the intercept.
df_resid	(float) The number of observation n minus the number of regressors p.
endog	(array) See Parameters.
exog	(array) See Parameters.
family	(family class instance) A pointer to the distribution family of the model.
freq_weights	(array) See Parameters.
mu	(array) The estimated mean response of the transformed variable.
n_trials	(array) See Parameters.
normalized_cov_params	(array) p x p normalized covariance of the design / exogenous data.
pinv_wexog	(array) For GLM this is just the pseudo inverse of the original design.
scale	(float) The estimate of the scale / dispersion. Available after fit is called.
scaletype	(str) The scaling used for fitting the model. Available after fit is called.
weights	(array) The value of the weights after the last iteration of fit.

Methods

`estimate_scale`(mu)	Estimates the dispersion/scale.
`estimate_tweedie_power`(mu[, method, low, high])	Tweedie specific function to estimate scale and the variance parameter.
`fit`([start_params, maxiter, method, tol, ...])	Fits a generalized linear model for a given family.
`fit_constrained`(constraints[, start_params])	fit the model subject to linear equality constraints
`fit_regularized`([method, alpha, ...])	Return a regularized fit to a linear regression model.
`from_formula`(formula, data[, subset, drop_cols])	Create a Model from a formula and dataframe.
`get_distribution`(params[, scale, exog, ...])	Returns a random number generator for the predictive distribution.
`hessian`(params[, scale, observed])	Hessian, second derivative of loglikelihood function
`hessian_factor`(params[, scale, observed])	Weights for calculating Hessian
`information`(params[, scale])	Fisher information matrix.
`initialize`()	Initialize a generalized linear model.
`loglike`(params[, scale])	Evaluate the log-likelihood for a generalized linear model.
`loglike_mu`(mu[, scale])	Evaluate the log-likelihood for a generalized linear model.
`predict`(params[, exog, exposure, offset, linear])	Return predicted values for a design matrix
`score`(params[, scale])	score, first derivative of the loglikelihood function
`score_factor`(params[, scale])	weights for score for each observation
`score_obs`(params[, scale])	score first derivative of the loglikelihood for each observation.
`score_test`(params_constrained[, ...])	score test for restrictions or for omitted variables

Methods

`estimate_scale`(mu)	Estimates the dispersion/scale.
`estimate_tweedie_power`(mu[, method, low, high])	Tweedie specific function to estimate scale and the variance parameter.
`fit`([start_params, maxiter, method, tol, ...])	Fits a generalized linear model for a given family.
`fit_constrained`(constraints[, start_params])	fit the model subject to linear equality constraints
`fit_regularized`([method, alpha, ...])	Return a regularized fit to a linear regression model.
`from_formula`(formula, data[, subset, drop_cols])	Create a Model from a formula and dataframe.
`get_distribution`(params[, scale, exog, ...])	Returns a random number generator for the predictive distribution.
`hessian`(params[, scale, observed])	Hessian, second derivative of loglikelihood function
`hessian_factor`(params[, scale, observed])	Weights for calculating Hessian
`information`(params[, scale])	Fisher information matrix.
`initialize`()	Initialize a generalized linear model.
`loglike`(params[, scale])	Evaluate the log-likelihood for a generalized linear model.
`loglike_mu`(mu[, scale])	Evaluate the log-likelihood for a generalized linear model.
`predict`(params[, exog, exposure, offset, linear])	Return predicted values for a design matrix
`score`(params[, scale])	score, first derivative of the loglikelihood function
`score_factor`(params[, scale])	weights for score for each observation
`score_obs`(params[, scale])	score first derivative of the loglikelihood for each observation.
`score_test`(params_constrained[, ...])	score test for restrictions or for omitted variables

Attributes

`endog_names`	Names of endogenous variables
`exog_names`	Names of exogenous variables

statsmodels.genmod.generalized_linear_model.GLM¶

Previous topic

Next topic

This Page