Journal of Educational Measurement Spring 2019, Vol. 56, No. 1, pp. 76–100 The Effects of Incomplete Rating Designs in Combination With Rater Effects Stefanie A. Wind University of Alabama Eli Jones Columbus State University Researchers have explored a variety of topics related to identifying and distinguish- ing among speciﬁc types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of three rater effects (leniency, central tendency, and severity) in combination with different types of incomplete rating designs (systematic links, anchor performances, and spiral). We used the rating scale model and the partial credit model to calculate rater location estimates, standard errors of rater estimates, model–data ﬁt statistics, and the standard deviation of rating scale category thresholds as indicators of rater effects and we explored the sensitivity of these indicators to rater effects under dif- ferent conditions. Our results suggest that it is possible to detect rater effects when each of the three types of rating designs is used. However, there are differences in the sensitivity of each indicator related to type of rater effect, type of rating design, and the overall proportion of effect raters. We discuss implications for research and practice related to rater-mediated assessments. In light of concerns related to the quality of rater judgments in performance assessments, many researchers have discussed and examined rater effects, such as severity/leniency, restriction to subsets of rating scale categories (e.g., central tendency/extremism), and systematic biases (i.e., differential rater functioning). In previous studies, researchers have evaluated rating quality using a variety of method- ological approaches, including generalizability theory (Baird, Hayes, Johnson, John- son, & Lamprianou, 2013; Brennan, 2000; Hill, Charalambous, & Kraft, 2012) and latent trait models, such as Rasch models (Eckes, 2015; Engelhard, 2002; Myford & Wolfe, 2003; Wolfe & McVay, 2012). In general, the goal of this research is to identify raters whose judgments may not accurately reﬂect the quality of examinees’ performances. Such information can inform the interpretation and use of ratings and help leaders of scoring centers identify raters who may need additional training. In addition to research on the quality of rater judgments, several researchers have explored the implications of different data collection designs for rater-mediated performance assessments. In these studies, researchers have discussed the basic requirements for data collection systems that allow researchers to obtain estimates of examinee achievement and rater severity in the presence of incomplete data (Engel- hard, 1997; Schumacker, 1999), as well as the impacts of different rating designs on Areas of Specialization: Rater-mediated assessment; Item response theory; Nonparametric IRT 76 c  2019 by the National Council on Measurement in Education