Abstract: | The present article reviews the procedures that have been developed for measuring the reliability of human observers' judgments when making direct observations of behavior. Measures such as the percentage of agreement, Cohen's kappa, and phi have been used to measure observer agreement; however, these coefficients have serious limitations. In addition to specifying the deficiencies that exist with these excessively used reliability measures, the present article discusses recently developed univariate and multivariate agreement measures that are based on quasi-equiprobability and quasi-independence models and estimates. Improvements in precision are provided by such models and estimates since they (1) yield a probability based coefficient of agreement with a directly interpretable meaning, (2) correct for the proportion of “chance” agreement, and (3) allow the partitioning of the agreement and disagreement estimates within the models. |