statsmodels.regression.linear_model.WLS.fit_regularized

WLS.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=1.0, start_params=None, profile_scale=False, refit=False, **kwargs)[source]

Return a regularized fit to a linear regression model.

Parameters
methodstr

Either ‘elastic_net’ or ‘sqrt_lasso’.

alphascalar or array_like

The penalty weight. If a scalar, the same penalty weight applies to all variables in the model. If a vector, it must have the same length as params, and contains a penalty weight for each coefficient.

L1_wtscalar

The fraction of the penalty given to the L1 penalty term. Must be between 0 and 1 (inclusive). If 0, the fit is a ridge fit, if 1 it is a lasso fit.

start_paramsarray_like

Starting values for params.

profile_scalebool

If True the penalized fit is computed using the profile (concentrated) log-likelihood for the Gaussian model. Otherwise the fit uses the residual sum of squares.

refitbool

If True, the model is refit using only the variables that have non-zero coefficients in the regularized fit. The refitted model is not regularized.

**kwargs

Additional keyword arguments that contain information used when constructing a model using the formula interface.

Returns
statsmodels.base.elastic_net.RegularizedResults

The regularized results.

Notes

The elastic net uses a combination of L1 and L2 penalties. The implementation closely follows the glmnet package in R.

The function that is minimized is:

\[0.5*RSS/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)\]

where RSS is the usual regression sum of squares, n is the sample size, and \(|*|_1\) and \(|*|_2\) are the L1 and L2 norms.

For WLS and GLS, the RSS is calculated using the whitened endog and exog data.

Post-estimation results are based on the same data used to select variables, hence may be subject to overfitting biases.

The elastic_net method uses the following keyword arguments:

maxiterint

Maximum number of iterations

cnvrg_tolfloat

Convergence threshold for line searches

zero_tolfloat

Coefficients below this threshold are treated as zero.

The square root lasso approach is a variation of the Lasso that is largely self-tuning (the optimal tuning parameter does not depend on the standard deviation of the regression errors). If the errors are Gaussian, the tuning parameter can be taken to be

alpha = 1.1 * np.sqrt(n) * norm.ppf(1 - 0.05 / (2 * p))

where n is the sample size and p is the number of predictors.

The square root lasso uses the following keyword arguments:

zero_tolfloat

Coefficients below this threshold are treated as zero.

The cvxopt module is required to estimate model using the square root lasso.

References

*

Friedman, Hastie, Tibshirani (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 Feb 2010.

A Belloni, V Chernozhukov, L Wang (2011). Square-root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791-806. https://arxiv.org/pdf/1009.5689.pdf