Educational Measurement: Issues and Practice xxxx 2017, Vol. 00, No. 0, pp. 1–11 Comparability in Balanced Assessment Systems for State Accountability Carla M. Evans, Education Department, University of New Hampshire, Durham, and Susan Lyons ∗ , National Center for the Improvement of Educational Assessment, Dover The purpose of this study was to test methods that strengthen the comparability claims about annual determinations of student proﬁciency in English language arts, math, and science (Grades 3–12) in the New Hampshire Performance Assessment of Competency Education (NH PACE) pilot project. First, we examined the literature in order to deﬁne comparability outside the bounds of strict score interchangeability and explored methods for estimating comparability that support a balanced assessment system for state accountability such as the NH PACE pilot. Second, we applied two strategies—consensus scoring and a rank-ordering method—to estimate comparability in Year 1 of the NH PACE pilot based upon the expert judgment of 85 teachers using 396 student work samples. We found the methods were effective for providing evidence of comparability and also detecting when threats to comparability were present. The evidence did not indicate meaningful differences in district average scoring and therefore did not support adjustments to district-level cut scores used to create annual determinations. The article concludes with a discussion of the technical challenges and opportunities associated with innovative, balanced assessment systems in an accountability context. Keywords: accountability, assessment system design, comparability, competency-based education, performance-based assessments A ccountability has inﬂuenced the use and design of as- sessments for the past two decades and pervades the current context (Hamilton, Stecher, & Klein, 2002; Harg- reaves & Braun, 2013). Some have argued that the negative effects of standardized accountability tests on curriculum and instruction occur because of a fundamental misalignment be- tween the purpose of assessment and the role assessment has played in schools (Resnick & Resnick, 1992; Shepard, 2000). This disconnect can lead to an incoherent system of assessments that do not provide instructional feedback to teachers, narrows the curriculum to focus on only those stan- dards and subjects tested on state assessments, and drives the teaching and learning of fragmented bits of knowledge rather than deeper learning (Darling-Hammond, Wilhoit, & Pittenger, 2014; Pellegrino, Chudowsky, & Glaser, 2001; Smith & O’Day, 1991). There has been an increasing call for multi- ple assessments to be designed and used as a “balanced,” ∗ The order of the authors was determined by ﬂipping a coin. Both authors contributed equally to this article. Carla M. Evans, Education Department, University of New Hamp- shire, 62 College Road, Morrill Hall 308, Durham, NH 03824; carla.m.evans@gmail.com. Susan Lyons, is an associate with the National Center for the Improvement of Educational Assessment (Center for Assessment), 31 Mount Vernon Street, Dover, NH 03820; slyons@nciea.org. “comprehensive,” or “next generation” assessment system (Council of Chief State School Ofﬁcers, 2015; Darling- Hammond et al., 2014; Heritage, 2010; Pellegrino et al., 2001; Stiggins, 2006). The challenge lies in designing assessment and accountability systems that can support instructional uses while serving accountability purposes (Baker & Gordon, 2014; Gong, 2010; Marion & Leather, 2015). One example of using an assessment system to provide in- formation from the classroom to the statehouse, while fulﬁll- ing federal accountability purposes, is currently taking place in New Hampshire (NH). In March 2015, the U.S. Depart- ment of Education ofﬁcially approved New Hampshire’s Per- formance Assessment of Competency Education (NH PACE) pilot project for a two-year waiver (2014–2015 and 2015–2016 school years) from federal statutory requirements related to annual state-level achievement testing (NHDOE, 2015). The NH PACE pilot was granted an additional 1-year waiver for the 2016–2017 school year. In the NH PACE system, local assessments administered throughout the school year contribute to students’ overall competency scores which are used to make annual determi- nations for state and federal accountability. Therefore, one key technical challenge of the NH PACE system, and likely any balanced assessment system that does not rely solely on standardized achievement tests, is using the information from multiple, local assessment sources to support comparable ac- countability determinations. C  2017 by the National Council on Measurement in Education 1