statsmodels.multivariate.pca.pca

statsmodels.multivariate.pca.pca(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd')[source]

Perform Principal Component Analysis (PCA).

Parameters:
datandarray

Variables in columns, observations in rows.

ncompint, optional

Number of components to return. If None, returns the as many as the smaller to the number of rows or columns of data.

standardizebool, optional

Flag indicating to use standardized data with mean 0 and unit variance. standardized being True implies demean.

demeanbool, optional

Flag indicating whether to demean data before computing principal components. demean is ignored if standardize is True.

normalizebool , optional

Indicates whether th normalize the factors to have unit inner product. If False, the loadings will have unit inner product.

glsbool, optional

Flag indicating to implement a two-step GLS estimator where in the first step principal components are used to estimate residuals, and then the inverse residual variance is used as a set of weights to estimate the final principal components

weightsndarray, optional

Series weights to use after transforming data according to standardize or demean when computing the principal components.

methodstr, optional

Determines the linear algebra routine uses. ‘eig’, the default, uses an eigenvalue decomposition. ‘svd’ uses a singular value decomposition.

Returns:
factors{ndarray, DataFrame}

Array (nobs, ncomp) of principal components (also known as scores).

loadings{ndarray, DataFrame}

Array (ncomp, nvar) of principal component loadings for constructing the factors.

projection{ndarray, DataFrame}

Array (nobs, nvar) containing the projection of the data onto the ncomp estimated factors.

rsquare{ndarray, Series}

Array (ncomp,) where the element in the ith position is the R-square of including the fist i principal components. The values are calculated on the transformed data, not the original data.

ic{ndarray, DataFrame}

Array (ncomp, 3) containing the Bai and Ng (2003) Information criteria. Each column is a different criteria, and each row represents the number of included factors.

eigenvals{ndarray, Series}

Array of eigenvalues (nvar,).

eigenvecs{ndarray, DataFrame}

Array of eigenvectors. (nvar, nvar).

Notes

This is a simple function wrapper around the PCA class. See PCA for more information and additional methods.