Regression with Discrete Dependent Variable

Regression models for limited and qualitative dependent variables. The module currently allows the estimation of models with binary (Logit, Probit), nominal (MNLogit), or count (Poisson, NegativeBinomial) data.

Starting with version 0.9, this also includes new count models, that are still experimental in 0.9, NegativeBinomialP, GeneralizedPoisson and zero-inflated models, ZeroInflatedPoisson, ZeroInflatedNegativeBinomialP and ZeroInflatedGeneralizedPoisson.

See Module Reference for commands and arguments.

Examples

# Load the data from Spector and Mazzeo (1980)
In [1]: import statsmodels.api as sm

In [2]: spector_data = sm.datasets.spector.load_pandas()

In [3]: spector_data.exog = sm.add_constant(spector_data.exog)

# Logit Model
In [4]: logit_mod = sm.Logit(spector_data.endog, spector_data.exog)

In [5]: logit_res = logit_mod.fit()
Optimization terminated successfully.
         Current function value: 0.402801
         Iterations 7

In [6]: print(logit_res.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                  GRADE   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Mon, 20 Jan 2025   Pseudo R-squ.:                  0.3740
Time:                        16:29:32   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
Covariance Type:            nonrobust   LLR p-value:                  0.001502
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -13.0213      4.931     -2.641      0.008     -22.687      -3.356
GPA            2.8261      1.263      2.238      0.025       0.351       5.301
TUCE           0.0952      0.142      0.672      0.501      -0.182       0.373
PSI            2.3787      1.065      2.234      0.025       0.292       4.465
==============================================================================

Detailed examples can be found here:

Technical Documentation

Currently all models are estimated by Maximum Likelihood and assume independently and identically distributed errors.

All discrete regression models define the same methods and follow the same structure, which is similar to the regression results but with some methods specific to discrete models. Additionally some of them contain additional model specific methods and attributes.

References

General references for this class of models are:

A.C. Cameron and P.K. Trivedi.  `Regression Analysis of Count Data`.
    Cambridge, 1998

G.S. Madalla. `Limited-Dependent and Qualitative Variables in Econometrics`.
    Cambridge, 1983.

W. Greene. `Econometric Analysis`. Prentice Hall, 5th. edition. 2003.

Module Reference

The specific model classes are:

Logit(endog, exog[, offset, check_rank])

Logit Model

Probit(endog, exog[, offset, check_rank])

Probit Model

MNLogit(endog, exog[, check_rank])

Multinomial Logit Model

Poisson(endog, exog[, offset, exposure, ...])

Poisson Model

NegativeBinomial(endog, exog[, ...])

Negative Binomial Model

NegativeBinomialP(endog, exog[, p, offset, ...])

Generalized Negative Binomial (NB-P) Model

GeneralizedPoisson(endog, exog[, p, offset, ...])

Generalized Poisson Model

ZeroInflatedPoisson(endog, exog[, ...])

Poisson Zero Inflated Model

ZeroInflatedNegativeBinomialP(endog, exog[, ...])

Zero Inflated Generalized Negative Binomial Model

ZeroInflatedGeneralizedPoisson(endog, exog)

Zero Inflated Generalized Poisson Model

HurdleCountModel(endog, exog[, offset, ...])

Hurdle model for count data

TruncatedLFNegativeBinomialP(endog, exog[, ...])

Truncated Generalized Negative Binomial model for count data

TruncatedLFPoisson(endog, exog[, offset, ...])

Truncated Poisson model for count data

ConditionalLogit(endog, exog[, missing])

Fit a conditional logistic regression model to grouped data.

ConditionalMNLogit(endog, exog[, missing])

Fit a conditional multinomial logit model to grouped data.

ConditionalPoisson(endog, exog[, missing])

Fit a conditional Poisson regression model to grouped data.

The cumulative link model for an ordinal dependent variable is currently in miscmodels as it subclasses GenericLikelihoodModel. This will change in future versions.

OrderedModel(endog, exog[, offset, distr])

Ordinal Model based on logistic or normal distribution

The specific result classes are:

LogitResults(model, mlefit[, cov_type, ...])

A results class for Logit Model

ProbitResults(model, mlefit[, cov_type, ...])

A results class for Probit Model

CountResults(model, mlefit[, cov_type, ...])

A results class for count data

MultinomialResults(model, mlefit)

A results class for multinomial data

NegativeBinomialResults(model, mlefit[, ...])

A results class for NegativeBinomial 1 and 2

GeneralizedPoissonResults(model, mlefit[, ...])

A results class for Generalized Poisson

ZeroInflatedPoissonResults(model, mlefit[, ...])

A results class for Zero Inflated Poisson

ZeroInflatedNegativeBinomialResults(model, ...)

A results class for Zero Inflated Generalized Negative Binomial

ZeroInflatedGeneralizedPoissonResults(model, ...)

A results class for Zero Inflated Generalized Poisson

HurdleCountResults(model, mlefit, ...[, ...])

A results class for Hurdle model

TruncatedLFPoissonResults(model, mlefit[, ...])

A results class for Truncated Poisson

TruncatedNegativeBinomialResults(model, mlefit)

A results class for Truncated Negative Binomial

ConditionalResults(model, params, ...)

Attributes:

OrderedResults(model, mlefit)

Results class for OrderedModel

DiscreteModel is a superclass of all discrete regression models. The estimation results are returned as an instance of one of the subclasses of DiscreteResults. Each category of models, binary, count and multinomial, have their own intermediate level of model and results classes. This intermediate classes are mostly to facilitate the implementation of the methods and attributes defined by DiscreteModel and DiscreteResults.

DiscreteModel(endog, exog[, check_rank])

Abstract class for discrete choice models.

DiscreteResults(model, mlefit[, cov_type, ...])

A results class for the discrete dependent variable models.

BinaryModel(endog, exog[, offset, check_rank])

Attributes:

BinaryResults(model, mlefit[, cov_type, ...])

A results class for binary data

CountModel(endog, exog[, offset, exposure, ...])

Attributes:

MultinomialModel(endog, exog[, offset, ...])

Attributes:

GenericZeroInflated(endog, exog[, ...])

Generic Zero Inflated Model


Last update: Jan 20, 2025