statsmodels.othermod.betareg.BetaModel

class statsmodels.othermod.betareg.BetaModel(endog, exog, exog_precision=None, link=<statsmodels.genmod.families.links.Logit object>, link_precision=<statsmodels.genmod.families.links.Log object>, **kwds)[source]

Beta Regression.

The Model is parameterized by mean and precision. Both can depend on explanatory variables through link functions.

Parameters:
endogarray_like

1d array of endogenous response variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user (models specified using a formula include an intercept by default). See statsmodels.tools.add_constant.

exog_precisionarray_like

2d array of variables for the precision.

linklink

Any link in sm.families.links for mean, should have range in interval [0, 1]. Default is logit-link.

link_precisionlink

Any link in sm.families.links for precision, should have range in positive line. Default is log-link.

**kwdsextra keywords

Keyword options that will be handled by super classes. Not all general keywords will be supported in this class.

See also

Link Functions

Notes

Status: experimental, new in 0.13. Core results are verified, but api can change and some extra results specific to Beta regression are missing.

Examples

Beta regression with default of logit-link for exog and log-link for precision.

>>> mod = BetaModel(endog, exog)
>>> rslt = mod.fit()
>>> print(rslt.summary())

We can also specify a formula and a specific structure and use the identity-link for precision.

>>> from sm.families.links import identity
>>> Z = patsy.dmatrix('~ temp', dat, return_type='dataframe')
>>> mod = BetaModel.from_formula('iyield ~ C(batch, Treatment(10)) + temp',
...                              dat, exog_precision=Z,
...                              link_precision=identity())

In the case of proportion-data, we may think that the precision depends on the number of measurements. E.g for sequence data, on the number of sequence reads covering a site:

>>> Z = patsy.dmatrix('~ coverage', df)
>>> formula = 'methylation ~ disease + age + gender + coverage'
>>> mod = BetaModel.from_formula(formula, df, Z)
>>> rslt = mod.fit()
Attributes:
endog_names

Names of endogenous variables.

exog_names

Names of exogenous variables.

Methods

expandparams(params)

expand to full parameter array when some parameters are fixed

fit([start_params, maxiter, disp, method])

Fit the model by maximum likelihood.

from_formula(formula, data[, ...])

Create a Model from a formula and dataframe.

get_distribution(params[, exog, exog_precision])

Return a instance of the predictive distribution.

get_distribution_params(params[, exog, ...])

Return distribution parameters converted from model prediction.

hessian(params[, observed])

Hessian, second derivative of loglikelihood function

hessian_factor(params[, scale, observed])

Weights for calculating Hessian

information(params)

Fisher information matrix of model.

initialize()

Initialize (possibly re-initialize) a Model instance.

loglike(params)

Log-likelihood of model at params

loglikeobs(params)

Loglikelihood for observations of the Beta regressionmodel.

nloglike(params)

Negative log-likelihood of model at params

predict(params[, exog, exog_precision, which])

Predict values for mean or precision

predict_precision(params[, exog_precision])

Predict values for precision function for given exog_precision.

predict_var(params[, exog, exog_precision])

predict values for conditional variance V(endog | exog)

reduceparams(params)

Reduce parameters

score(params)

Returns the score vector of the log-likelihood.

score_factor(params)

Derivative of loglikelihood function w.r.t.

score_hessian_factor(params[, ...])

Derivatives of loglikelihood function w.r.t.

score_obs(params)

Score, first derivative of the loglikelihood for each observation.

Properties

endog_names

Names of endogenous variables.

exog_names

Names of exogenous variables.