statsmodels.miscmodels.ordinal_model.OrderedModel¶
- class statsmodels.miscmodels.ordinal_model.OrderedModel(endog, exog, offset=None, distr='probit', **kwds)[source]¶
Ordinal Model based on logistic or normal distribution
The parameterization corresponds to the proportional odds model in the logistic case. The model assumes that the endogenous variable is ordered but that the labels have no numeric interpretation besides the ordering.
The model is based on a latent linear variable, where we observe only a discretization.
y_latent = X beta + u
The observed variable is defined by the interval
- y = {0 if y_latent <= cut_0
1 of cut_0 < y_latent <= cut_1 … K if cut_K < y_latent
The probability of observing y=k conditional on the explanatory variables X is given by
- prob(y = k | x) = Prob(cut_k < y_latent <= cut_k+1)
= Prob(cut_k - x beta < u <= cut_k+1 - x beta = F(cut_k+1 - x beta) - F(cut_k - x beta)
Where F is the cumulative distribution of u which is either the normal or the logistic distribution, but can be set to any other continuous distribution. We use standardized distributions to avoid identifiability problems.
- Parameters:
- endogarray_like
Endogenous or dependent ordered categorical variable with k levels. Labels or values of endog will internally transformed to consecutive integers, 0, 1, 2, … pd.Series with ordered Categorical as dtype should be preferred as it gives the order relation between the levels. If endog is not a pandas Categorical, then categories are sorted in lexicographic order (by numpy.unique).
- exogarray_like
Exogenous, explanatory variables. This should not include an intercept. pd.DataFrame are also accepted. see Notes about constant when using formulas
- offsetarray_like
Offset is added to the linear prediction with coefficient equal to 1.
- distr
str
‘probit’ or ‘logit’,or
a
distribution
instance
The default is currently ‘probit’ which uses the normal distribution and corresponds to an ordered Probit model. The distribution is assumed to have the main methods of scipy.stats distributions, mainly cdf, pdf and ppf. The inverse cdf, ppf, is only use to calculate starting values.
Notes
Status: experimental, core results are verified, still subclasses GenericLikelihoodModel which will change in future versions.
The parameterization of OrderedModel requires that there is no constant in the model, neither explicit nor implicit. The constant is equivalent to shifting all thresholds and is therefore not separately identified.
Patsy’s formula specification does not allow a design matrix without explicit or implicit constant if there are categorical variables (or maybe splines) among explanatory variables. As workaround, statsmodels removes an explicit intercept.
Consequently, there are two valid cases to get a design matrix without intercept when using formulas:
specify a model without explicit and implicit intercept which is possible if there are only numerical variables in the model.
specify a model with an explicit intercept which statsmodels will remove.
Models with an implicit intercept will be overparameterized, the parameter estimates will not be fully identified, cov_params will not be invertible and standard errors might contain nans. The computed results will be dominated by numerical imprecision coming mainly from convergence tolerance and numerical derivatives.
The model will raise a ValueError if a remaining constant is detected.
- Attributes:
endog_names
Names of endogenous variables.
exog_names
Names of exogenous variables.
start_params
Start parameters for the optimization corresponding to null model.
Methods
cdf
(x)Cdf evaluated at x.
expandparams
(params)expand to full parameter array when some parameters are fixed
fit
([start_params, method, maxiter, ...])Fit method for likelihood based models
from_formula
(formula, data[, subset, drop_cols])Create a Model from a formula and dataframe.
hessian
(params)Hessian of log-likelihood evaluated at params
hessian_factor
(params[, scale, observed])Weights for calculating Hessian
information
(params)Fisher information matrix of model.
Initialize (possibly re-initialize) a Model instance.
loglike
(params)Log-likelihood of model at params
loglikeobs
(params)Log-likelihood of OrderdModel for all observations.
nloglike
(params)Negative log-likelihood of model at params
pdf
(x)Pdf evaluated at x
predict
(params[, exog, offset, which])Predicted probabilities for each level of the ordinal endog.
prob
(low, upp)Interval probability.
reduceparams
(params)Reduce parameters
score
(params)Gradient of log-likelihood evaluated at params
score_obs
(params, **kwds)Jacobian/Gradient of log-likelihood evaluated at params for each observation.
score_obs_
(params)score, first derivative of loglike for each observations
obtain transformed thresholds from original thresholds or cutoffs
transform_threshold_params
(params)transformation of the parameters in the optimization
Properties
Names of endogenous variables.
Names of exogenous variables.
Start parameters for the optimization corresponding to null model.