2005 Royal Statistical Society 0964–1998/05/168753 J. R. Statist. Soc. A (2005) 168, Part 4, pp. 753–762 Resolving paradoxes involving surrogate end points Stuart G. Baker, Grant Izmirlian and Victor Kipnis National Cancer Institute, Bethesda, USA [Received February 2004. Revised December 2004] Summary. We define a surrogate end point as a measure or indicator of a biological process that is obtained sooner, at less cost or less invasively than a true end point of health outcome and is used to make conclusions about the effect of an intervention on the true end point. Prentice presented criteria for valid hypothesis testing of a surrogate end point that replaces a true end point. For using the surrogate end point to estimate the predicted effect of intervention on the true end point, Day and Duffy assumed the Prentice criterion and arrived at two paradoxical results: the estimated predicted intervention effect by using a surrogate can give more precise estimates than the usual estimate of the intervention effect by using the true end point and the variance is greatest when the surrogate end point perfectly predicts the true end point. Begg and Leung formulated similar paradoxes and concluded that they indicate a flawed conceptual strategy arising from the Prentice criterion.We resolve the paradoxes as follows. Day and Duffy compared a surrogate-based estimate of the effect of intervention on the true end point with an estimate of the effect of intervention on the true end point that uses the true end point. Their par- adox arose because the former estimate assumes the Prentice criterion whereas the latter does not. If both or neither of these estimates assume the Prentice criterion, there is no paradox. The paradoxes of Begg and Leung, although similar to those of Day and Duffy, arise from ignoring the variability of the parameter estimates irrespective of the Prentice criterion and disappear when the variability is included. Our resolution of the paradoxes provides a firm foundation for future meta-analytic extensions of the approach of Day and Duffy. Keyword : Prentice criterion 1. Introduction We define a surrogate end point as an end point that is obtained sooner, at less cost or less invasively than a true end point and is used to make conclusions about the effect of intervention on the true end point. Examples include a stage of cancer as a surrogate end point for death from cancer or diastolic blood pressure as a surrogate end point for strokes. In the context of randomized trials (which is the focus here), the objective is to use the surrogate end point to make inference about the effect of an intervention on the true end point in an application trial in which only the surrogate end point is observed. Before the use of the surrogate end point in an application trial, it must be validated by using data from a trial with both surrogate and true end points, which we call a validation trial. Much early statistical work on surrogate end points focused on using the surrogate end point to replace the true end point. Because the surrogate and true end points are on different scales a direct comparison is meaningless. Therefore inference was confined to hypothesis testing. In the situation of hypothesis testing, validation consists of showing that rejection of the null hypothesis under the surrogate end point implies rejection of the null hypothesis under the true end point in a validation trial. In a landmark paper, Prentice (1989) gave criteria when the null Address for correspondence: Stuart G. Baker, Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, EPN 3131, 6130 Executive Boulevard MSC 7354, Bethesda, MD 20892-7354, USA. E-mail: sb16i@nih.gov Downloaded from https://academic.oup.com/jrsssa/article/168/4/753/7084323 by guest on 05 May 2023