Statistics stats

This section collects various statistical tests and tools. Some can be used independently of any models, some are intended as extension to the models and model results.

API Warning: The functions and objects in this category are spread out in various modules and might still be moved around. We expect that in future the statistical tests will return class instances with more informative reporting instead of only the raw numbers.

Residual Diagnostics and Specification Tests

Autocorrelation

durbin_watson(resids[, axis])

Calculates the Durbin-Watson statistic.

acorr_breusch_godfrey(res[, nlags, store])

Breusch-Godfrey Lagrange Multiplier tests for residual autocorrelation.

acorr_ljungbox(x[, lags, boxpierce, ...])

Ljung-Box test of autocorrelation in residuals.

acorr_lm(resid[, nlags, store, period, ...])

Lagrange Multiplier tests for autocorrelation.

Heteroscedasticity

het_arch(resid[, nlags, store, ddof])

Engle's Test for Autoregressive Conditional Heteroscedasticity (ARCH).

het_breuschpagan(resid, exog_het[, robust])

Breusch-Pagan Lagrange Multiplier test for heteroscedasticity

het_goldfeldquandt(y, x[, idx, split, drop, ...])

Goldfeld-Quandt homoskedasticity test.

het_white(resid, exog)

White's Lagrange Multiplier Test for Heteroscedasticity.

Kurtosis

robust_kurtosis(y[, axis, ab, dg, excess])

Calculates the four kurtosis measures in Kim & White

expected_robust_kurtosis([ab, dg])

Calculates the expected value of the robust kurtosis measures in Kim and White assuming the data are normally distributed.

Normality

jarque_bera(resids[, axis])

The Jarque-Bera test of normality.

omni_normtest(resids[, axis])

Omnibus test for normality

Outliers and Influence

OLSInfluence(results)

class to calculate outlier and influence measures for OLS result

GLMInfluence(results[, resid, endog, exog, ...])

Influence and outlier measures (experimental)

MLEInfluence(results[, resid, endog, exog, ...])

Global Influence and outlier measures (experimental)

variance_inflation_factor(exog, exog_idx)

Variance inflation factor, VIF, for one exogenous variable

Skewness

medcouple(y[, axis])

Calculate the medcouple robust measure of skew.

robust_skewness(y[, axis])

Calculates the four skewness measures in Kim & White

Stability

breaks_cusumolsresid(resid[, ddof])

Cusum test for parameter stability based on ols residuals.

breaks_hansen(olsresults)

Test for model stability, breaks in parameters for ols, Hansen 1992

recursive_olsresiduals(res[, skip, lamda, ...])

Calculate recursive ols with residuals and Cusum test statistic

Specification test

compare_cox(results_x, results_z[, store])

Compute the Cox test for non-nested models

compare_encompassing(results_x, results_z[, ...])

Davidson-MacKinnon encompassing test for comparing non-nested models

compare_j(results_x, results_z[, store])

Compute the J-test for non-nested models

spec_white(resid, exog)

White's Two-Moment Specification Test

linear_harvey_collier(res[, order_by, skip])

Harvey Collier test for linearity

linear_lm(resid, exog[, func])

Lagrange multiplier test for linearity against functional alternative

linear_rainbow(res[, frac, order_by, ...])

Rainbow test for linearity

linear_reset(res[, power, test_type, use_f, ...])

Ramsey's RESET test for neglected nonlinearity

See also the notes on notes on regression diagnostics

Sandwich Robust Covariances

The following functions calculate covariance matrices and standard errors for the parameter estimates that are robust to heteroscedasticity and autocorrelation in the errors. Similar to the methods that are available for the LinearModelResults, these methods are designed for use with OLS.

sandwich_covariance.cov_hac(results[, ...])

heteroscedasticity and autocorrelation robust covariance matrix (Newey-West)

sandwich_covariance.cov_nw_panel(results, ...)

Panel HAC robust covariance matrix

sandwich_covariance.cov_nw_groupsum(results, ...)

Driscoll and Kraay Panel robust covariance matrix

sandwich_covariance.cov_cluster(results, group)

cluster robust covariance matrix

sandwich_covariance.cov_cluster_2groups(...)

cluster robust covariance matrix for two groups/clusters

sandwich_covariance.cov_white_simple(results)

heteroscedasticity robust covariance matrix (White)

The following are standalone versions of the heteroscedasticity robust standard errors attached to LinearModelResults

sandwich_covariance.cov_hc0(results)

See statsmodels.RegressionResults

sandwich_covariance.cov_hc1(results)

See statsmodels.RegressionResults

sandwich_covariance.cov_hc2(results)

See statsmodels.RegressionResults

sandwich_covariance.cov_hc3(results)

See statsmodels.RegressionResults

sandwich_covariance.se_cov(cov)

get standard deviation from covariance matrix

Goodness of Fit Tests and Measures

some tests for goodness of fit for univariate distributions

powerdiscrepancy(observed, expected[, ...])

Calculates power discrepancy, a class of goodness-of-fit tests as a measure of discrepancy between observed and expected data.

gof_chisquare_discrete(distfn, arg, rvs, ...)

perform chisquare test for random sample of a discrete distribution

gof_binning_discrete(rvs, distfn, arg[, nsupp])

get bins for chisquare type gof tests for a discrete distribution

chisquare_effectsize(probs0, probs1[, ...])

effect size for a chisquare goodness-of-fit test

anderson_statistic(x[, dist, fit, params, axis])

Calculate the Anderson-Darling a2 statistic.

normal_ad(x[, axis])

Anderson-Darling test for normal distribution unknown mean and variance.

kstest_exponential(x, *[, dist, pvalmethod])

Test assumed normal or exponential distribution using Lilliefors' test.

kstest_fit(x[, dist, pvalmethod])

Test assumed normal or exponential distribution using Lilliefors' test.

kstest_normal(x[, dist, pvalmethod])

Test assumed normal or exponential distribution using Lilliefors' test.

lilliefors(x[, dist, pvalmethod])

Test assumed normal or exponential distribution using Lilliefors' test.

Non-Parametric Tests

mcnemar(x[, y, exact, correction])

McNemar test

symmetry_bowker(table)

Test for symmetry of a (k, k) square contingency table

median_test_ksample(x, groups)

chisquare test for equality of median/location

runstest_1samp(x[, cutoff, correction])

use runs test on binary discretized data above/below cutoff

runstest_2samp(x[, y, groups, correction])

Wald-Wolfowitz runstest for two samples

cochrans_q(x)

Cochran's Q test for identical effect of k treatments

Runs(x)

class for runs in a binary sequence

sign_test(samp[, mu0])

Signs test

rank_compare_2indep(x1, x2[, use_t])

Statistics and tests for the probability that x1 has larger values than x2.

rank_compare_2ordinal(count1, count2[, ...])

Stochastically larger probability for 2 independent ordinal samples.

samplesize_rank_compare_onetail(...[, ...])

Compute sample size for the non-parametric Mann-Whitney U test.

RankCompareResult([tuple_])

Results for rank comparison

cohensd2problarger(d)

Convert Cohen's d effect size to stochastically-larger-probability.

prob_larger_continuous(distr1, distr2)

Probability indicating that distr1 is stochastically larger than distr2.

rankdata_2samp(x1, x2)

Compute midranks for two samples

Descriptive Statistics

describe(data[, stats, numeric, ...])

Extended descriptive statistics for data

Description(data[, stats, numeric, ...])

Extended descriptive statistics for data

Interrater Reliability and Agreement

The main function that statsmodels has currently available for interrater agreement measures and tests is Cohen’s Kappa. Fleiss’ Kappa is currently only implemented as a measures but without associated results statistics.

cohens_kappa(table[, weights, ...])

Compute Cohen's kappa with variance and equal-zero test

fleiss_kappa(table[, method])

Fleiss' and Randolph's kappa multi-rater agreement measure

to_table(data[, bins])

convert raw data with shape (subject, rater) to (rater1, rater2)

aggregate_raters(data[, n_cat])

convert raw data with shape (subject, rater) to (subject, cat_counts)

Multiple Tests and Multiple Comparison Procedures

multipletests is a function for p-value correction, which also includes p-value correction based on fdr in fdrcorrection. tukeyhsd performs simultaneous testing for the comparison of (independent) means. These three functions are verified. GroupsStats and MultiComparison are convenience classes to multiple comparisons similar to one way ANOVA, but still in development

multipletests(pvals[, alpha, method, ...])

Test results and p-value correction for multiple tests

fdrcorrection(pvals[, alpha, method, is_sorted])

pvalue correction for false discovery rate.

GroupsStats(x[, useranks, uni, intlab])

statistics by groups (another version)

MultiComparison(data, groups[, group_order])

Tests for multiple comparisons

TukeyHSDResults(mc_object, results_table, q_crit)

Results from Tukey HSD test, with additional plot methods

pairwise_tukeyhsd(endog, groups[, alpha])

Calculate all pairwise comparisons with TukeyHSD confidence intervals

local_fdr(zscores[, null_proportion, ...])

Calculate local FDR values for a list of Z-scores.

fdrcorrection_twostage(pvals[, alpha, ...])

(iterated) two stage linear step-up procedure with estimation of number of true hypotheses

NullDistribution(zscores[, null_lb, ...])

Estimate a Gaussian distribution for the null Z-scores.

RegressionFDR(endog, exog, regeffects[, method])

Control FDR in a regression procedure.

CorrelationEffects()

Marginal correlation effect sizes for FDR control.

OLSEffects()

OLS regression for knockoff analysis.

ForwardEffects(pursuit)

Forward selection effect sizes for FDR control.

OLSEffects()

OLS regression for knockoff analysis.

RegModelEffects(model_cls[, regularized, ...])

Use any regression model for Regression FDR analysis.

The following functions are not (yet) public

varcorrection_pairs_unbalanced(nobs_all[, ...])

correction factor for variance with unequal sample sizes for all pairs

varcorrection_pairs_unequal(var_all, ...)

return joint variance from samples with unequal variances and unequal sample sizes for all pairs

varcorrection_unbalanced(nobs_all[, srange])

correction factor for variance with unequal sample sizes

varcorrection_unequal(var_all, nobs_all, df_all)

return joint variance from samples with unequal variances and unequal sample sizes

StepDown(vals, nobs_all, var_all[, df])

a class for step down methods

catstack(args)

ccols

An array object represents a multidimensional, homogeneous array of fixed-size items.

compare_ordered(vals, alpha)

simple ordered sequential comparison of means

distance_st_range(mean_all, nobs_all, var_all)

pairwise distance matrix, outsourced from tukeyhsd

ecdf(x)

no frills empirical cdf used in fdrcorrection

get_tukeyQcrit(k, df[, alpha])

return critical values for Tukey's HSD (Q)

homogeneous_subsets(vals, dcrit)

recursively check all pairs of vals for minimum distance

maxzero(x)

find all up zero crossings and return the index of the highest

maxzerodown(x)

find all up zero crossings and return the index of the highest

mcfdr([nrepl, nobs, ntests, ntrue, mu, ...])

MonteCarlo to test fdrcorrection

qcrit

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

randmvn(rho[, size, standardize])

create random draws from equi-correlated multivariate normal distribution

rankdata(x)

rankdata, equivalent to scipy.stats.rankdata

rejectionline(n[, alpha])

reference line for rejection in multiple tests

set_partition(ssli)

extract a partition from a list of tuples

set_remove_subs(ssli)

remove sets that are subsets of another set from a list of tuples

tiecorrect(xranks)

should be equivalent of scipy.stats.tiecorrect

Basic Statistics and t-Tests with frequency weights

Besides basic statistics, like mean, variance, covariance and correlation for data with case weights, the classes here provide one and two sample tests for means. The t-tests have more options than those in scipy.stats, but are more restrictive in the shape of the arrays. Confidence intervals for means are provided based on the same assumptions as the t-tests.

Additionally, tests for equivalence of means are available for one sample and for two, either paired or independent, samples. These tests are based on TOST, two one-sided tests, which have as null hypothesis that the means are not “close” to each other.

DescrStatsW(data[, weights, ddof])

Descriptive statistics and tests with weights for case weights

CompareMeans(d1, d2)

class for two sample comparison

ttest_ind(x1, x2[, alternative, usevar, ...])

ttest independent sample

ttost_ind(x1, x2, low, upp[, usevar, ...])

test of (non-)equivalence for two independent samples

ttost_paired(x1, x2, low, upp[, transform, ...])

test of (non-)equivalence for two dependent, paired sample

ztest(x1[, x2, value, alternative, usevar, ddof])

test for mean based on normal distribution, one or two samples

ztost(x1, low, upp[, x2, usevar, ddof])

Equivalence test based on normal distribution

zconfint(x1[, x2, value, alpha, ...])

confidence interval based on normal distribution z-test

weightstats also contains tests and confidence intervals based on summary data

_tconfint_generic(mean, std_mean, dof, ...)

generic t-confint based on summary statistic

_tstat_generic(value1, value2, std_diff, ...)

generic ttest based on summary statistic

_zconfint_generic(mean, std_mean, alpha, ...)

generic normal-confint based on summary statistic

_zstat_generic(value1, value2, std_diff, ...)

generic (normal) z-test based on summary statistic

_zstat_generic2(value, std, alternative)

generic (normal) z-test based on summary statistic

Power and Sample Size Calculations

The power module currently implements power and sample size calculations for the t-tests, normal based test, F-tests and Chisquare goodness of fit test. The implementation is class based, but the module also provides three shortcut functions, tt_solve_power, tt_ind_solve_power and zt_ind_solve_power to solve for any one of the parameters of the power equations.

TTestIndPower(**kwds)

Statistical Power calculations for t-test for two independent sample

TTestPower(**kwds)

Statistical Power calculations for one sample or paired sample t-test

GofChisquarePower(**kwds)

Statistical Power calculations for one sample chisquare test

NormalIndPower([ddof])

Statistical Power calculations for z-test for two independent samples.

FTestAnovaPower(**kwds)

Statistical Power calculations F-test for one factor balanced ANOVA

FTestPower(**kwds)

Statistical Power calculations for generic F-test of a constraint

normal_power_het(diff, nobs, alpha[, ...])

Calculate power of a normal distributed test statistic

normal_sample_size_one_tail(diff, power, alpha)

explicit sample size computation if only one tail is relevant

tt_solve_power([effect_size, nobs, alpha, ...])

solve for any one parameter of the power of a one sample t-test

tt_ind_solve_power([effect_size, nobs1, ...])

solve for any one parameter of the power of a two sample t-test

zt_ind_solve_power([effect_size, nobs1, ...])

solve for any one parameter of the power of a two sample z-test

Proportion

Also available are hypothesis test, confidence intervals and effect size for proportions that can be used with NormalIndPower.

proportion_confint(count, nobs[, alpha, ...])

Confidence interval for a binomial proportion

proportion_effectsize(prop1, prop2[, method])

Effect size for a test comparing two proportions

binom_test(count, nobs[, prop, alternative])

Perform a test that the probability of success is p.

binom_test_reject_interval(value, nobs[, ...])

Rejection region for binomial test for one sample proportion

binom_tost(count, nobs, low, upp)

Exact TOST test for one proportion using binomial distribution

binom_tost_reject_interval(low, upp, nobs[, ...])

Rejection region for binomial TOST

multinomial_proportions_confint(counts[, ...])

Confidence intervals for multinomial proportions.

proportions_ztest(count, nobs[, value, ...])

Test for proportions based on normal (z) test

proportions_ztost(count, nobs, low, upp[, ...])

Equivalence test based on normal distribution

proportions_chisquare(count, nobs[, value])

Test for proportions based on chisquare test

proportions_chisquare_allpairs(count, nobs)

Chisquare test of proportions for all pairs of k samples

proportions_chisquare_pairscontrol(count, nobs)

Chisquare test of proportions for pairs of k samples compared to control

proportion_effectsize(prop1, prop2[, method])

Effect size for a test comparing two proportions

power_binom_tost(low, upp, nobs[, p_alt, alpha])

power_ztost_prop(low, upp, nobs, p_alt[, ...])

Power of proportions equivalence test based on normal distribution

samplesize_confint_proportion(proportion, ...)

Find sample size to get desired confidence interval length

Statistics for two independent samples Status: experimental, API might change, added in 0.12

test_proportions_2indep(count1, nobs1, ...)

Hypothesis test for comparing two independent proportions

confint_proportions_2indep(count1, nobs1, ...)

Confidence intervals for comparing two independent proportions.

power_proportions_2indep(diff, prop2, nobs1)

Power for ztest that two independent proportions are equal

tost_proportions_2indep(count1, nobs1, ...)

Equivalence test based on two one-sided test_proportions_2indep

samplesize_proportions_2indep_onetail(diff, ...)

Required sample size assuming normal distribution based on one tail

score_test_proportions_2indep(count1, nobs1, ...)

Score test for two independent proportions

_score_confint_inversion(count1, nobs1, ...)

Compute score confidence interval by inverting score test

Rates

Statistical functions for rates. This currently includes hypothesis tests for two independent samples. See also example notebook for an overview Poisson Rates

Status: experimental, API might change, added in 0.12, refactored and enhanced in 0.14

statistical function for one sample

test_poisson(count, nobs, value[, method, ...])

Test for one sample poisson mean or rate

confint_poisson(count, exposure[, method, ...])

Confidence interval for a Poisson mean or rate

confint_quantile_poisson(count, exposure, prob)

confidence interval for quantile of poisson random variable

tolerance_int_poisson(count, exposure[, ...])

tolerance interval for a poisson observation

statistical function for two independent samples

test_poisson_2indep(count1, exposure1, ...)

Test for comparing two sample Poisson intensity rates.

etest_poisson_2indep(count1, exposure1, ...)

E-test for ratio of two sample Poisson rates.

confint_poisson_2indep(count1, exposure1, ...)

Confidence interval for ratio or difference of 2 indep poisson rates.

tost_poisson_2indep(count1, exposure1, ...)

Equivalence test based on two one-sided test_proportions_2indep

nonequivalence_poisson_2indep(count1, ...[, ...])

Test for non-equivalence, minimum effect for poisson.

functions for statistical power

power_poisson_ratio_2indep(rate1, rate2, nobs1)

Power of test of ratio of 2 independent poisson rates.

power_equivalence_poisson_2indep(rate1, ...)

Power of equivalence test of ratio of 2 independent poisson rates.

power_poisson_diff_2indep(rate1, rate2, nobs1)

Power of ztest for the difference between two independent poisson rates.

power_negbin_ratio_2indep(rate1, rate2, nobs1)

Power of test of ratio of 2 independent negative binomial rates.

power_equivalence_neginb_2indep(rate1, ...)

Power of equivalence test of ratio of 2 indep.

Multivariate

Statistical functions for multivariate samples.

This includes hypothesis test and confidence intervals for mean of sample of multivariate observations and hypothesis tests for the structure of a covariance matrix.

Status: experimental, API might change, added in 0.12

test_mvmean(data[, mean_null, return_results])

Hotellings test for multivariate mean in one sample

confint_mvmean(data[, lin_transf, alpha, simult])

Confidence interval for linear transformation of a multivariate mean

confint_mvmean_fromstats(mean, cov, nobs[, ...])

Confidence interval for linear transformation of a multivariate mean

test_mvmean_2indep(data1, data2)

Hotellings test for multivariate mean in two independent samples

test_cov(cov, nobs, cov_null)

One sample hypothesis test for covariance equal to null covariance

test_cov_blockdiagonal(cov, nobs, block_len)

One sample hypothesis test that covariance is block diagonal.

test_cov_diagonal(cov, nobs)

One sample hypothesis test that covariance matrix is diagonal matrix.

test_cov_oneway(cov_list, nobs_list)

Multiple sample hypothesis test that covariance matrices are equal.

test_cov_spherical(cov, nobs)

One sample hypothesis test that covariance matrix is spherical

Oneway Anova

Hypothesis test, confidence intervals and effect size for oneway analysis of k samples.

Status: experimental, API might change, added in 0.12

anova_oneway(data[, groups, use_var, ...])

Oneway Anova

anova_generic(means, variances, nobs[, ...])

Oneway Anova based on summary statistics

equivalence_oneway(data, equiv_margin[, ...])

equivalence test for oneway anova (Wellek's Anova)

equivalence_oneway_generic(f_stat, n_groups, ...)

Equivalence test for oneway anova (Wellek and extensions)

power_equivalence_oneway(f2_alt, ...[, ...])

Power of oneway equivalence test

_power_equivalence_oneway_emp(f_stat, ...[, ...])

Empirical power of oneway equivalence test

test_scale_oneway(data[, method, center, ...])

Oneway Anova test for equal scale, variance or dispersion

equivalence_scale_oneway(data, equiv_margin)

Oneway Anova test for equivalence of scale, variance or dispersion

confint_effectsize_oneway(f_stat, df[, ...])

Confidence interval for effect size in oneway anova for F distribution

confint_noncentrality(f_stat, df[, alpha, ...])

Confidence interval for noncentrality parameter in F-test

convert_effectsize_fsqu([f2, eta2])

Convert squared effect sizes in f family

effectsize_oneway(means, vars_, nobs[, ...])

Effect size corresponding to Cohen's f = nc / nobs for oneway anova

f2_to_wellek(f2, n_groups)

Convert Cohen's f-squared to Wellek's effect size (sqrt)

fstat_to_wellek(f_stat, n_groups, nobs_mean)

Convert F statistic to wellek's effect size eps squared

wellek_to_f2(eps, n_groups)

Convert Wellek's effect size (sqrt) to Cohen's f-squared

_fstat2effectsize(f_stat, df)

Compute anova effect size from F-statistic

scale_transform(data[, center, transform, ...])

Transform data for variance comparison for Levene type tests

simulate_power_equivalence_oneway(means, ...)

Simulate Power for oneway equivalence test (Wellek's Anova)

Robust, Trimmed Statistics

Statistics for samples that are trimmed at a fixed fraction. This includes class TrimmedMean for one sample statistics. It is used in stats.oneway for trimmed “Yuen” Anova.

Status: experimental, API might change, added in 0.12

TrimmedMean(data, fraction[, is_sorted, axis])

class for trimmed and winsorized one sample statistics

scale_transform(data[, center, transform, ...])

Transform data for variance comparison for Levene type tests

trim_mean(a, proportiontocut[, axis])

Return mean of array after trimming observations from both tails.

trimboth(a, proportiontocut[, axis])

Slices off a proportion of items from both ends of an array.

Moment Helpers

When there are missing values, then it is possible that a correlation or covariance matrix is not positive semi-definite. The following functions can be used to find a correlation or covariance matrix that is positive definite and close to the original matrix. Additional functions estimate spatial covariance matrix and regularized inverse covariance or precision matrix.

corr_clipped(corr[, threshold])

Find a near correlation matrix that is positive semi-definite

corr_nearest(corr[, threshold, n_fact])

Find the nearest correlation matrix that is positive semi-definite.

corr_nearest_factor(corr, rank[, ctol, ...])

Find the nearest correlation matrix with factor structure to a given square matrix.

corr_thresholded(data[, minabs, max_elt])

Construct a sparse matrix containing the thresholded row-wise correlation matrix from a data array.

cov_nearest(cov[, method, threshold, ...])

Find the nearest covariance matrix that is positive (semi-) definite

cov_nearest_factor_homog(cov, rank)

Approximate an arbitrary square matrix with a factor-structured matrix of the form k*I + XX'.

FactoredPSDMatrix(diag, root)

Representation of a positive semidefinite matrix in factored form.

kernel_covariance(exog, loc, groups[, ...])

Use kernel averaging to estimate a multivariate covariance function.

RegularizedInvCovariance(exog)

Class for estimating regularized inverse covariance with nodewise regression

These are utility functions to convert between central and non-central moments, skew, kurtosis and cummulants.

cum2mc(kappa)

convert non-central moments to cumulants recursive formula produces as many cumulants as moments

mc2mnc(mc)

convert central to non-central moments, uses recursive formula optionally adjusts first moment to return mean

mc2mvsk(args)

convert central moments to mean, variance, skew, kurtosis

mnc2cum(mnc)

convert non-central moments to cumulants recursive formula produces as many cumulants as moments

mnc2mc(mnc[, wmean])

convert non-central to central moments, uses recursive formula optionally adjusts first moment to return mean

mnc2mvsk(args)

convert central moments to mean, variance, skew, kurtosis

mvsk2mc(args)

convert mean, variance, skew, kurtosis to central moments

mvsk2mnc(args)

convert mean, variance, skew, kurtosis to non-central moments

cov2corr(cov[, return_std])

convert covariance matrix to correlation matrix

corr2cov(corr, std)

convert correlation matrix to covariance matrix given standard deviation

se_cov(cov)

get standard deviation from covariance matrix

Mediation Analysis

Mediation analysis focuses on the relationships among three key variables: an ‘outcome’, a ‘treatment’, and a ‘mediator’. Since mediation analysis is a form of causal inference, there are several assumptions involved that are difficult or impossible to verify. Ideally, mediation analysis is conducted in the context of an experiment such as this one in which the treatment is randomly assigned. It is also common for people to conduct mediation analyses using observational data in which the treatment may be thought of as an ‘exposure’. The assumptions behind mediation analysis are even more difficult to verify in an observational setting.

Mediation(outcome_model, mediator_model, ...)

Conduct a mediation analysis.

MediationResults(indirect_effects, ...)

A class for holding the results of a mediation analysis.

Oaxaca-Blinder Decomposition

The Oaxaca-Blinder, or Blinder-Oaxaca as some call it, decomposition attempts to explain gaps in means of groups. It uses the linear models of two given regression equations to show what is explained by regression coefficients and known data and what is unexplained using the same data. There are two types of Oaxaca-Blinder decompositions, the two-fold and the three-fold, both of which can and are used in Economics Literature to discuss differences in groups. This method helps classify discrimination or unobserved effects. This function attempts to port the functionality of the oaxaca command in STATA to Python.

OaxacaBlinder(endog, exog, bifurcate[, ...])

Class to perform Oaxaca-Blinder Decomposition.

OaxacaResults(results, model_type[, std_val])

This class summarizes the fit of the OaxacaBlinder model.

Distance Dependence Measures

Distance dependence measures and the Distance Covariance (dCov) test.

distance_covariance_test(x, y[, B, method])

The Distance Covariance (dCov) test

distance_statistics(x, y[, x_dist, y_dist])

Calculate various distance dependence statistics.

distance_correlation(x, y)

Distance correlation.

distance_covariance(x, y)

Distance covariance.

distance_variance(x)

Distance variance.

Meta-Analysis

Functions for basic meta-analysis of a collection of sample statistics.

Examples can be found in the notebook

Status: experimental, API might change, added in 0.12

combine_effects(effect, variance[, ...])

combining effect sizes for effect sizes using meta-analysis

effectsize_2proportions(count1, nobs1, ...)

Effects sizes for two sample binomial proportions

effectsize_smd(mean1, sd1, nobs1, mean2, ...)

effect sizes for mean difference for use in meta-analysis

CombineResults(**kwds)

Results from combined estimate of means or effect sizes

The module also includes internal functions to compute random effects variance.

_fit_tau_iter_mm(eff, var_eff[, tau2_start, ...])

iterated method of moment estimate of between random effect variance

_fit_tau_iterative(eff, var_eff[, ...])

Paule-Mandel iterative estimate of between random effect variance

_fit_tau_mm(eff, var_eff, weights)

one-step method of moment estimate of between random effect variance


Last update: Nov 14, 2024