statsmodels.stats.inter_rater.fleiss_kappa¶

statsmodels.stats.inter_rater.fleiss_kappa(table, method='fleiss')[source]¶

Fleiss’ and Randolph’s kappa multi-rater agreement measure

Parameters:

tablearray_like, 2-D: assumes subjects in rows, and categories in columns
methodstr: Method ‘fleiss’ returns Fleiss’ kappa which uses the sample margin to define the chance outcome. Method ‘randolph’ or ‘uniform’ (only first 4 letters are needed) returns Randolph’s (2005) multirater kappa which assumes a uniform distribution of the categories to define the chance outcome.

Returns:

kappafloat: Fleiss’s or Randolph’s kappa statistic for inter rater agreement

Notes

no variance or hypothesis tests yet

Interrater agreement measures like Fleiss’s kappa measure agreement relative to chance agreement. Different authors have proposed ways of defining these chance agreements. Fleiss’ is based on the marginal sample distribution of categories, while Randolph uses a uniform distribution of categories as benchmark. Warrens (2010) showed that Randolph’s kappa is always larger or equal to Fleiss’ kappa. Under some commonly observed condition, Fleiss’ and Randolph’s kappa provide lower and upper bounds for two similar kappa_like measures by Light (1971) and Hubert (1977).

References

Wikipedia https://en.wikipedia.org/wiki/Fleiss%27_kappa

Fleiss, Joseph L. 1971. “Measuring Nominal Scale Agreement among Many Raters.” Psychological Bulletin 76 (5): 378-82. https://doi.org/10.1037/h0031619.

Randolph, Justus J. 2005 “Free-Marginal Multirater Kappa (multirater K [free]): An Alternative to Fleiss’ Fixed-Marginal Multirater Kappa.” Presented at the Joensuu Learning and Instruction Symposium, vol. 2005 https://eric.ed.gov/?id=ED490661

Warrens, Matthijs J. 2010. “Inequalities between Multi-Rater Kappas.” Advances in Data Analysis and Classification 4 (4): 271-86. https://doi.org/10.1007/s11634-010-0073-4.