Comparing Methods to Denote Treatment Outcome in Clinical Research and Benchmarking Mental Health Care Edwin de Beurs, 1 * Marko Barendregt, 1 Arco de Heer, 2 Erik van Duijn, 3 Bob Goeree, 4 Margot Kloos, 5 Kees Kooiman, 6 Helen Lionarons 7 and Andre Merks 8 1 SBG, Bilthoven, the Netherlands 2 Clinical Psychology, Leiden University, Leiden, the Netherlands 3 GGZ-Delﬂand, Delft, the Netherlands 4 Synaeda, Leeuwarden, the Netherlands 5 Propersona, Renkum, the Netherlands 6 Riagg Rijnmond, Vlaardingen, the Netherlands 7 Lionarons-GGZ, Heerlen, the Netherlands 8 Emergis, Goes, the Netherlands Approaches based on continuous indicators (the size of the pre-to-post-test change; effect size or ΔT) and on categorical indicators (Percentage Improvement and the Jacobson–Truax approach to Clinical Signiﬁcance) are evaluated to determine which has the best methodological and statistical characteris- tics, and optimal performance, in comparing outcomes of treatment providers. Performance is compared in two datasets from providers using the Brief Symptom Inventory or the Outcome Questionnaire. Concordance of methods and their suitability to rank providers is assessed. Outcome indicators tend to converge and lead to a similar ranking of institutes within each dataset. Statistically and conceptu- ally, continuous outcome indicators are superior to categorical outcomes as change scores have more sta- tistical power and allow for a ranking of providers at ﬁrst glance. However, the Jacobson–Truax approach can complement the change score approach as it presents outcome information in a clinically meaningful manner. Copyright © 2015 John Wiley & Sons, Ltd. Key Practitioners Messages: • When comparing various indicators or treatment outcome, statistical considerations designate continu- ous outcomes, such as the effect size of the pre–post change (effect size or ΔT) as the optimal choice. • Expressing outcome in proportions of recovered, changed, unchanged or deteriorated patients has sup- plementary value, as it is more easily interpreted and appreciated by clinicians, managerial staff and, last but not the least, by patients. • If categorical outcomes are used with small datasets, true differences in institutional performance may get obscured due to diminished power to detect differences. • With sufﬁcient data, outcome according to continuous and categorical indicators converge and lead to similar rankings of institutes’ performance. Keywords: Treatment outcome, Effect Size, Percentage Inprovement (PI), Reliable Change Index (RCI), Benchmarking Since 2010, the mental healthcare ﬁeld in the Netherlands has embarked on a nationwide effort to collect outcome data to support patient care and enable benchmarking of treatment providers. The systematic collection of patient- based data on the outcome of individual treatment is called Routine Outcome Monitoring (ROM). Ideally, ROM comprises a baseline assessment with a standardized diagnostic interview, administration of rating scales and self-report measures in combination with repeated assess- ments of patients’ mental health and functioning (de Beurs et al., 2011). The primary reason for collecting outcome data routinely is that it can support individual therapy, as these data provide feedback to the professional and the patient about progress, or lack thereof (Lambert, 2007). When certain conditions are met, aggregated ROM data may provide transparency regarding the effectiveness of mental health care in everyday clinical practice and allow for comparison of mental healthcare institutes. *Correspondence to: Edwin de Beurs, SBG, Postbus 281, Bilthoven 3720 AG, the Netherlands. E-mail: edwin.debeurs@sbggz.nl Clinical Psychology and Psychotherapy Clin. Psychol. Psychother. (2015) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpp.1954 Copyright © 2015 John Wiley & Sons, Ltd.