statsmodels.stats.outliers_influence.OLSInfluence¶
-
class
statsmodels.stats.outliers_influence.
OLSInfluence
(results)[source]¶ class to calculate outlier and influence measures for OLS result
- Parameters
- results
RegressionResults
currently assumes the results are from an OLS regression
- results
Notes
One part of the results can be calculated without any auxiliary regression (some of which have the _internal postfix in the name. Other statistics require leave-one-observation-out (LOOO) auxiliary regression, and will be slower (mainly results with _external postfix in the name). The auxiliary LOOO regression only the required results are stored.
Using the LOO measures is currently only recommended if the data set is not too large. One possible approach for LOOO measures would be to identify possible problem observations with the _internal measures, and then run the leave-one-observation-out only with observations that are possible outliers. (However, this is not yet available in an automated way.)
This should be extended to general least squares.
The leave-one-variable-out (LOVO) auxiliary regression are currently not used.
- Attributes
- cooks_distance
Cooks distance
Uses original results, no nobs loop
- *
Eubank, R. L. (1999). Nonparametric regression and spline smoothing. CRC press.
- †
Cook’s distance. (n.d.). In Wikipedia. July 2019, from https://en.wikipedia.org/wiki/Cook%27s_distance
- cov_ratio
covariance ratio between LOOO and original
This uses determinant of the estimate of the parameter covariance from leave-one-out estimates. requires leave one out loop for observations
det_cov_params_not_obsi
determinant of cov_params of all LOOO regressions
- dfbeta
dfbetas
uses results from leave-one-observation-out loop
- dfbetas
dfbetas
uses results from leave-one-observation-out loop
- dffits
dffits measure for influence of an observation
based on resid_studentized_external, uses results from leave-one-observation-out loop
It is recommended that observations with dffits large than a threshold of 2 sqrt{k / n} where k is the number of parameters, should be investigated.
dffits : float dffits_threshold : float
- dffits_internal
dffits measure for influence of an observation
based on resid_studentized_internal uses original results, no nobs loop
- ess_press
Error sum of squares of PRESS residuals
- hat_diag_factor
Factor of diagonal of hat_matrix used in influence
this might be useful for internal reuse h / (1 - h)
- hat_matrix_diag
Diagonal of the hat_matrix for OLS
temporarily calculated here, this should go to model class
- influence
Influence measure
matches the influence measure that gretl reports u * h / (1 - h) where u are the residuals and h is the diagonal of the hat_matrix
params_not_obsi
parameter estimates for all LOOO regressions
- resid_press
PRESS residuals
- resid_std
estimate of standard deviation of the residuals
resid_var
- resid_studentized
Studentized residuals using variance from OLS
alias for resid_studentized_internal for compatibility with MLEInfluence this uses sigma from original estimate and does not require leave one out loop
- resid_studentized_external
Studentized residuals using LOOO variance
this uses sigma from leave-one-out estimates
requires leave one out loop for observations
- resid_studentized_internal
Studentized residuals using variance from OLS
this uses sigma from original estimate does not require leave one out loop
- resid_var
estimate of variance of the residuals
sigma2 = sigma2_OLS * (1 - hii)
where hii is the diagonal of the hat matrix
- sigma2_not_obsi
error variance for all LOOO regressions
This is ‘mse_resid’ from each auxiliary regression.
uses results from leave-one-observation-out loop
Methods
get_resid_studentized_external
([sigma])calculate studentized residuals
plot_index
([y_var, threshold, title, ax, idx])index plot for influence attributes
plot_influence
([external, alpha, criterion, …])Plot of influence in regression.
Creates a DataFrame with all available influence results.
summary_table
([float_fmt])create a summary table with all influence and outlier measures
Methods
get_resid_studentized_external
([sigma])calculate studentized residuals
plot_index
([y_var, threshold, title, ax, idx])index plot for influence attributes
plot_influence
([external, alpha, criterion, …])Plot of influence in regression.
Creates a DataFrame with all available influence results.
summary_table
([float_fmt])create a summary table with all influence and outlier measures
Properties
Cooks distance
covariance ratio between LOOO and original
determinant of cov_params of all LOOO regressions
dfbetas
uses results from leave-one-observation-out loop
dffits measure for influence of an observation
dffits measure for influence of an observation
Error sum of squares of PRESS residuals
Factor of diagonal of hat_matrix used in influence
Diagonal of the hat_matrix for OLS
Influence measure
parameter estimates for all LOOO regressions
PRESS residuals
estimate of standard deviation of the residuals
Studentized residuals using variance from OLS
Studentized residuals using LOOO variance
Studentized residuals using variance from OLS
estimate of variance of the residuals
error variance for all LOOO regressions