Discrete Choice Models Overview¶
[1]:
import numpy as np
import statsmodels.api as sm
Data¶
Load data from Spector and Mazzeo (1980). Examples follow Greene’s Econometric Analysis Ch. 21 (5th Edition).
[2]:
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)
Inspect the data:
[3]:
print(spector_data.exog.head())
print(spector_data.endog.head())
GPA TUCE PSI const
0 2.66 20.0 0.0 1.0
1 2.89 22.0 0.0 1.0
2 3.28 24.0 0.0 1.0
3 2.92 12.0 0.0 1.0
4 4.00 21.0 0.0 1.0
0 0.0
1 0.0
2 0.0
3 0.0
4 1.0
Name: GRADE, dtype: float64
Linear Probability Model (OLS)¶
[4]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print("Parameters: ", lpm_res.params[:-1])
Parameters: GPA 0.463852
TUCE 0.010495
PSI 0.378555
dtype: float64
Logit Model¶
[5]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print("Parameters: ", logit_res.params)
Parameters: GPA 2.826113
TUCE 0.095158
PSI 2.378688
const -13.021347
dtype: float64
Marginal Effects
[6]:
margeff = logit_res.get_margeff()
print(margeff.summary())
Logit Marginal Effects
=====================================
Dep. Variable: GRADE
Method: dydx
At: overall
==============================================================================
dy/dx std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 0.3626 0.109 3.313 0.001 0.148 0.577
TUCE 0.0122 0.018 0.686 0.493 -0.023 0.047
PSI 0.3052 0.092 3.304 0.001 0.124 0.486
==============================================================================
As in all the discrete data models presented below, we can print a nice summary of results:
[7]:
print(logit_res.summary())
Logit Regression Results
==============================================================================
Dep. Variable: GRADE No. Observations: 32
Model: Logit Df Residuals: 28
Method: MLE Df Model: 3
Date: Wed, 02 Nov 2022 Pseudo R-squ.: 0.3740
Time: 17:10:42 Log-Likelihood: -12.890
converged: True LL-Null: -20.592
Covariance Type: nonrobust LLR p-value: 0.001502
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 2.8261 1.263 2.238 0.025 0.351 5.301
TUCE 0.0952 0.142 0.672 0.501 -0.182 0.373
PSI 2.3787 1.065 2.234 0.025 0.292 4.465
const -13.0213 4.931 -2.641 0.008 -22.687 -3.356
==============================================================================
Probit Model¶
[8]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print("Parameters: ", probit_res.params)
print("Marginal effects: ")
print(probit_margeff.summary())
Optimization terminated successfully.
Current function value: 0.400588
Iterations 6
Parameters: GPA 1.625810
TUCE 0.051729
PSI 1.426332
const -7.452320
dtype: float64
Marginal effects:
Probit Marginal Effects
=====================================
Dep. Variable: GRADE
Method: dydx
At: overall
==============================================================================
dy/dx std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 0.3608 0.113 3.182 0.001 0.139 0.583
TUCE 0.0115 0.018 0.624 0.533 -0.025 0.048
PSI 0.3165 0.090 3.508 0.000 0.140 0.493
==============================================================================
Multinomial Logit¶
Load data from the American National Election Studies:
[9]:
anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog = sm.add_constant(anes_exog)
Inspect the data:
[10]:
print(anes_data.exog.head())
print(anes_data.endog.head())
logpopul selfLR age educ income
0 -2.302585 7.0 36.0 3.0 1.0
1 5.247550 3.0 20.0 4.0 1.0
2 3.437208 2.0 24.0 6.0 1.0
3 4.420045 3.0 28.0 6.0 1.0
4 6.461624 5.0 68.0 6.0 1.0
0 6.0
1 1.0
2 1.0
3 1.0
4 0.0
Name: PID, dtype: float64
Fit MNL model:
[11]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print(mlogit_res.params)
Optimization terminated successfully.
Current function value: 1.548647
Iterations 7
0 1 2 3 4 5
const -0.373402 -2.250913 -3.665584 -7.613843 -7.060478 -12.105751
logpopul -0.011536 -0.088751 -0.105967 -0.091557 -0.093285 -0.140881
selfLR 0.297714 0.391669 0.573451 1.278772 1.346962 2.070080
age -0.024945 -0.022898 -0.014851 -0.008681 -0.017904 -0.009433
educ 0.082491 0.181043 -0.007152 0.199828 0.216939 0.321926
income 0.005197 0.047874 0.057575 0.084498 0.080958 0.108894
Poisson¶
Load the Rand data. Note that this example is similar to Cameron and Trivedi’s Microeconometrics
Table 20.5, but it is slightly different because of minor changes in the data.
[12]:
rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog
rand_exog = sm.add_constant(rand_exog, prepend=False)
Fit Poisson model:
[13]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print(poisson_res.summary())
Optimization terminated successfully.
Current function value: 3.091609
Iterations 6
Poisson Regression Results
==============================================================================
Dep. Variable: mdvis No. Observations: 20190
Model: Poisson Df Residuals: 20180
Method: MLE Df Model: 9
Date: Wed, 02 Nov 2022 Pseudo R-squ.: 0.06343
Time: 17:10:43 Log-Likelihood: -62420.
converged: True LL-Null: -66647.
Covariance Type: nonrobust LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
lncoins -0.0525 0.003 -18.216 0.000 -0.058 -0.047
idp -0.2471 0.011 -23.272 0.000 -0.268 -0.226
lpi 0.0353 0.002 19.302 0.000 0.032 0.039
fmde -0.0346 0.002 -21.439 0.000 -0.038 -0.031
physlm 0.2717 0.012 22.200 0.000 0.248 0.296
disea 0.0339 0.001 60.098 0.000 0.033 0.035
hlthg -0.0126 0.009 -1.366 0.172 -0.031 0.005
hlthf 0.0541 0.015 3.531 0.000 0.024 0.084
hlthp 0.2061 0.026 7.843 0.000 0.155 0.258
const 0.7004 0.011 62.741 0.000 0.678 0.722
==============================================================================
Negative Binomial¶
The negative binomial model gives slightly different results.
[14]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
print(res_nbin.summary())
/opt/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/site-packages/statsmodels/base/model.py:604: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
NegativeBinomial Regression Results
==============================================================================
Dep. Variable: mdvis No. Observations: 20190
Model: NegativeBinomial Df Residuals: 20180
Method: MLE Df Model: 9
Date: Wed, 02 Nov 2022 Pseudo R-squ.: 0.01845
Time: 17:10:43 Log-Likelihood: -43384.
converged: False LL-Null: -44199.
Covariance Type: nonrobust LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
lncoins -0.0579 0.006 -9.515 0.000 -0.070 -0.046
idp -0.2678 0.023 -11.802 0.000 -0.312 -0.223
lpi 0.0412 0.004 9.938 0.000 0.033 0.049
fmde -0.0381 0.003 -11.216 0.000 -0.045 -0.031
physlm 0.2691 0.030 8.985 0.000 0.210 0.328
disea 0.0382 0.001 26.080 0.000 0.035 0.041
hlthg -0.0441 0.020 -2.201 0.028 -0.083 -0.005
hlthf 0.0173 0.036 0.478 0.632 -0.054 0.088
hlthp 0.1782 0.074 2.399 0.016 0.033 0.324
const 0.6635 0.025 26.786 0.000 0.615 0.712
alpha 1.2930 0.019 69.477 0.000 1.256 1.329
==============================================================================
Alternative solvers¶
The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method
argument:
[15]:
mlogit_res = mlogit_mod.fit(method="bfgs", maxiter=250)
print(mlogit_res.summary())
Optimization terminated successfully.
Current function value: 1.548647
Iterations: 111
Function evaluations: 117
Gradient evaluations: 117
MNLogit Regression Results
==============================================================================
Dep. Variable: PID No. Observations: 944
Model: MNLogit Df Residuals: 908
Method: MLE Df Model: 30
Date: Wed, 02 Nov 2022 Pseudo R-squ.: 0.1648
Time: 17:10:44 Log-Likelihood: -1461.9
converged: True LL-Null: -1750.3
Covariance Type: nonrobust LLR p-value: 1.822e-102
==============================================================================
PID=1 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3734 0.630 -0.593 0.553 -1.608 0.861
logpopul -0.0115 0.034 -0.337 0.736 -0.079 0.056
selfLR 0.2977 0.094 3.180 0.001 0.114 0.481
age -0.0249 0.007 -3.823 0.000 -0.038 -0.012
educ 0.0825 0.074 1.121 0.262 -0.062 0.227
income 0.0052 0.018 0.295 0.768 -0.029 0.040
------------------------------------------------------------------------------
PID=2 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -2.2509 0.763 -2.949 0.003 -3.747 -0.755
logpopul -0.0888 0.039 -2.266 0.023 -0.166 -0.012
selfLR 0.3917 0.108 3.619 0.000 0.180 0.604
age -0.0229 0.008 -2.893 0.004 -0.038 -0.007
educ 0.1810 0.085 2.123 0.034 0.014 0.348
income 0.0479 0.022 2.149 0.032 0.004 0.092
------------------------------------------------------------------------------
PID=3 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -3.6656 1.157 -3.169 0.002 -5.932 -1.399
logpopul -0.1060 0.057 -1.858 0.063 -0.218 0.006
selfLR 0.5734 0.159 3.617 0.000 0.263 0.884
age -0.0149 0.011 -1.311 0.190 -0.037 0.007
educ -0.0072 0.126 -0.057 0.955 -0.255 0.240
income 0.0576 0.034 1.713 0.087 -0.008 0.123
------------------------------------------------------------------------------
PID=4 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -7.6139 0.958 -7.951 0.000 -9.491 -5.737
logpopul -0.0916 0.044 -2.091 0.037 -0.177 -0.006
selfLR 1.2788 0.129 9.921 0.000 1.026 1.531
age -0.0087 0.008 -1.031 0.302 -0.025 0.008
educ 0.1998 0.094 2.123 0.034 0.015 0.384
income 0.0845 0.026 3.226 0.001 0.033 0.136
------------------------------------------------------------------------------
PID=5 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -7.0605 0.844 -8.362 0.000 -8.715 -5.406
logpopul -0.0933 0.039 -2.371 0.018 -0.170 -0.016
selfLR 1.3470 0.117 11.494 0.000 1.117 1.577
age -0.0179 0.008 -2.352 0.019 -0.033 -0.003
educ 0.2169 0.085 2.552 0.011 0.050 0.384
income 0.0810 0.023 3.524 0.000 0.036 0.126
------------------------------------------------------------------------------
PID=6 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -12.1058 1.060 -11.421 0.000 -14.183 -10.028
logpopul -0.1409 0.042 -3.343 0.001 -0.223 -0.058
selfLR 2.0701 0.143 14.435 0.000 1.789 2.351
age -0.0094 0.008 -1.160 0.246 -0.025 0.007
educ 0.3219 0.091 3.534 0.000 0.143 0.500
income 0.1089 0.025 4.304 0.000 0.059 0.158
==============================================================================