Abstract: | This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling errors, in turn, results in unanticipated instability in the testing program and an increase in Type I errors in significance tests. This is especially true when the standard error of equating is underestimated. The problem is caused by the typical district and state practice of using nonprobability cluster-sampling procedures, such as convenience, purposeful, and quota sampling, then calculating statistics and standard errors as if the samples were simple random samples. |