statsmodels.regression.linear_model.GLS.fit_regularized

method

GLS.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=1.0, start_params=None, profile_scale=False, refit=False, **kwargs)[source]

Return a regularized fit to a linear regression model.

Parameters
methodstring

‘elastic_net’ and ‘sqrt_lasso’ are currently implemented.

alphascalar or array-like

The penalty weight. If a scalar, the same penalty weight applies to all variables in the model. If a vector, it must have the same length as params, and contains a penalty weight for each coefficient.

L1_wt: scalar

The fraction of the penalty given to the L1 penalty term. Must be between 0 and 1 (inclusive). If 0, the fit is a ridge fit, if 1 it is a lasso fit.

start_paramsarray-like

Starting values for params.

profile_scalebool

If True the penalized fit is computed using the profile (concentrated) log-likelihood for the Gaussian model. Otherwise the fit uses the residual sum of squares.

refitbool

If True, the model is refit using only the variables that have non-zero coefficients in the regularized fit. The refitted model is not regularized.

distributedbool

If True, the model uses distributed methods for fitting, will raise an error if True and partitions is None.

generatorfunction

generator used to partition the model, allows for handling of out of memory/parallel computing.

partitionsscalar

The number of partitions desired for the distributed estimation.

thresholdscalar or array-like

The threshold below which coefficients are zeroed out, only used for distributed estimation

Returns
A RegularizedResults instance.

Notes

The elastic net uses a combination of L1 and L2 penalties. The implementation closely follows the glmnet package in R.

The function that is minimized is:

\[0.5*RSS/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)\]

where RSS is the usual regression sum of squares, n is the sample size, and \(|*|_1\) and \(|*|_2\) are the L1 and L2 norms.

For WLS and GLS, the RSS is calculated using the whitened endog and exog data.

Post-estimation results are based on the same data used to select variables, hence may be subject to overfitting biases.

The elastic_net method uses the following keyword arguments:

maxiterint

Maximum number of iterations

cnvrg_tolfloat

Convergence threshold for line searches

zero_tolfloat

Coefficients below this threshold are treated as zero.

The square root lasso approach is a variation of the Lasso that is largely self-tuning (the optimal tuning parameter does not depend on the standard deviation of the regression errors). If the errors are Gaussian, the tuning parameter can be taken to be

alpha = 1.1 * np.sqrt(n) * norm.ppf(1 - 0.05 / (2 * p))

where n is the sample size and p is the number of predictors.

The square root lasso uses the following keyword arguments:

zero_tolfloat

Coefficients below this threshold are treated as zero.

References

Friedman, Hastie, Tibshirani (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 Feb 2010.

A Belloni, V Chernozhukov, L Wang (2011). Square-root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791-806. https://arxiv.org/pdf/1009.5689.pdf