statsmodels.genmod.generalized_estimating_equations.NominalGEE¶
-
class statsmodels.genmod.generalized_estimating_equations.NominalGEE(endog, exog, groups, time=
None
, family=None
, cov_struct=None
, missing='none'
, offset=None
, dep_data=None
, constraint=None
, **kwargs)[source]¶ Nominal Response Marginal Regression Model using GEE.
Marginal regression model fit using Generalized Estimating Equations.
GEE can be used to fit Generalized Linear Models (GLMs) when the data have a grouped structure, and the observations are possibly correlated within groups but not between groups.
- Parameters:¶
- endogarray_like
1d array of endogenous values (i.e. responses, outcomes, dependent variables, or ‘Y’ values).
- exogarray_like
2d array of exogeneous values (i.e. covariates, predictors, independent variables, regressors, or ‘X’ values). A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.
- groupsarray_like
A 1d array of length nobs containing the group labels.
- timearray_like
A 2d array of time (or other index) values, used by some dependence structures to define similarity relationships among observations within a cluster.
- family
family
class
instance
The default value None uses a multinomial logit family specifically designed for use with GEE. Setting this argument to a non-default value is not currently supported.
- cov_struct
CovStruct
class
instance
The default is Independence. To specify an exchangeable structure use cov_struct = Exchangeable(). See statsmodels.genmod.cov_struct.CovStruct for more information.
- offsetarray_like
An offset to be included in the fit. If provided, must be an array whose length is the number of rows in exog.
- dep_dataarray_like
Additional data passed to the dependence structure.
- constraint(
ndarray
,ndarray
) If provided, the constraint is a tuple (L, R) such that the model parameters are estimated under the constraint L * param = R, where L is a q x p matrix and R is a q-dimensional vector. If constraint is provided, a score test is performed to compare the constrained model to the unconstrained model.
- update_depbool
If true, the dependence parameters are optimized, otherwise they are held fixed at their starting values.
- weightsarray_like
An array of case weights to use in the analysis.
- missing
str
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
- Attributes:¶
- cached_means
endog_names
Names of endogenous variables.
exog_names
Names of exogenous variables.
exposure_name
Name of the exposure variable if available.
freq_weights_name
Name of the freq weights variable if available.
offset_name
Name of the offset variable if available.
var_weights_name
Name of var weights variable if available.
Notes
Only the following combinations make sense for family and link
+ ident log logit probit cloglog pow opow nbinom loglog logc Gaussian | x x x inv Gaussian | x x x binomial | x x x x x x x x x Poisson | x x x neg binomial | x x x x gamma | x x x
Not all of these link functions are currently available.
Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.
The “robust” covariance type is the standard “sandwich estimator” (e.g. Liang and Zeger (1986)). It is the default here and in most other packages. The “naive” estimator gives smaller standard errors, but is only correct if the working correlation structure is correctly specified. The “bias reduced” estimator of Mancl and DeRouen (Biometrics, 2001) reduces the downward bias of the robust estimator.
The robust covariance provided here follows Liang and Zeger (1986) and agrees with R’s gee implementation. To obtain the robust standard errors reported in Stata, multiply by sqrt(N / (N - g)), where N is the total sample size, and g is the average group size.
The nominal and ordinal GEE models should not have an intercept (either implicit or explicit). Use “0 + “ in a formula to suppress the intercept.
Examples
Fit a nominal regression model using GEE:
>>> import statsmodels.api as sm >>> import statsmodels.formula.api as smf >>> gor = sm.cov_struct.GlobalOddsRatio("nominal") >>> model = sm.NominalGEE(endog, exog, groups, cov_struct=gor) >>> result = model.fit() >>> print(result.summary())
Using formulas:
>>> import statsmodels.api as sm >>> model = sm.NominalGEE.from_formula("y ~ 0 + x1 + x2", groups, data, cov_struct=gor) >>> result = model.fit() >>> print(result.summary())
Using the formula API:
>>> import statsmodels.formula.api as smf >>> model = smf.nominal_gee("y ~ 0 + x1 + x2", groups, data, cov_struct=gor) >>> result = model.fit() >>> print(result.summary())
Methods
cluster_list
(array)Returns array split into subarrays corresponding to the cluster structure.
compare_score_test
(submodel)Perform a score test for the given submodel against this model.
Estimate the dispersion/scale.
estimate_tweedie_power
(mu[, method, low, high])Tweedie specific function to estimate scale and the variance parameter.
fit
([maxiter, ctol, start_params, ...])Fits a marginal regression model using generalized estimating equations (GEE).
fit_constrained
(constraints[, start_params])fit the model subject to linear equality constraints
fit_regularized
(pen_wt[, scad_param, ...])Regularized estimation for GEE.
from_formula
(formula, groups, data[, ...])Create a Model from a formula and dataframe.
get_distribution
(params[, scale, exog, ...])Return a instance of the predictive distribution.
hessian
(params[, scale, observed])Hessian, second derivative of loglikelihood function
hessian_factor
(params[, scale, observed])Weights for calculating Hessian
information
(params[, scale])Fisher information matrix.
Initialize a generalized linear model.
loglike
(params[, scale])Evaluate the log-likelihood for a generalized linear model.
loglike_mu
(mu[, scale])Evaluate the log-likelihood for a generalized linear model.
mean_deriv
(exog, lin_pred)Derivative of the expected endog with respect to the parameters.
mean_deriv_exog
(exog, params[, offset_exposure])Derivative of the expected endog with respect to exog for the multinomial model, used in analyzing marginal effects.
predict
(params[, exog, exposure, offset, ...])Return predicted values for a design matrix
qic
(params, scale, cov_params[, n_step])Returns quasi-information criteria and quasi-likelihood values.
score
(params[, scale])score, first derivative of the loglikelihood function
score_factor
(params[, scale])weights for score for each observation
score_obs
(params[, scale])score first derivative of the loglikelihood for each observation.
score_test
(params_constrained[, ...])score test for restrictions or for omitted variables
setup_nominal
(endog, exog, groups, time, offset)Restructure nominal data as binary indicators so that they can be analyzed using Generalized Estimating Equations.
update_cached_means
(mean_params)cached_means should always contain the most recent calculation of the group-wise mean vectors.
Properties
Names of endogenous variables.
Names of exogenous variables.
Name of the exposure variable if available.
Name of the freq weights variable if available.
Name of the offset variable if available.
Name of var weights variable if available.