Journal of Statistical Planning and Inference 139 (2009) 696 -- 710 Contents lists available at ScienceDirect Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi Likelihood-based confidence sets for partially identified parameters Zhiwei Zhang Division of Biostatistics, Center for Devices and Radiological Health, U.S. Food and Drug Administration, HFZ-550, Rockville, MD 20850, USA 1 ARTICLE INFO ABSTRACT Article history: Received 4 December 2006 Accepted 7 August 2007 Available online 29 May 2008 Keywords: Censoring Confidence set Likelihood Missing data Partial identification Verification bias There has been growing interest in partial identification of probability distributions and pa- rameters. This paper considers statistical inference on parameters that are partially identified because data are incompletely observed, due to nonresponse or censoring, for instance. A method based on likelihood ratios is proposed for constructing confidence sets for partially identified parameters. The method can be used to estimate a proportion or a mean in the presence of missing data, without assuming missing-at-random or modeling the missing-data mechanism. It can also be used to estimate a survival probability with censored data without assuming independent censoring or modeling the censoring mechanism. A version of the ver- ification bias problem is studied as well. Published by Elsevier B.V 1. Introduction Identifiability of a parameter means, roughly, that the parameter relates to the distribution of the observed data in a one-to-one fashion. Without identifiability, a consistent point estimator does not exist, and many good properties of a point estimator become impossible. Consequently, most of the modern theory of statistical inference requires that the inferential target be identifiable. When the parameter of interest is not fully identifiable from the observed data, it is common practice to impose additional assumptions or constraints that reduce the probability model and help identify the parameter. Such assumptions rely on external information such as prior knowledge and cannot be validated with the data alone. In reality, reliable external information is often unavailable, and identifying assumptions frequently driven by practical rather than scientific considerations. Because different assumptions may lead to different conclusions, it makes sense to compare results obtained under different assumptions if no single identifying assumption is strongly preferred. This practical approach, called sensitivity analysis, nonetheless lacks scientific rigor. It is usually impossible to enumerate all possible identifying assumptions, and often difficult to conduct and interpret a sensitivity analysis in a systematic and objective manner. Example (Proportion). To fix ideas, consider the problem of estimating a proportion with missing data. Suppose that X is a Bernoulli variable, and that the parameter of interest is = P(X = 1), the probability of success. If X is always observed, then is completely identified from a random sample of X. Suppose, however, that X is potentially missing, which may happen because of nonresponse in surveys, for instance. Let R be the observation indicator, so R = 1 if X is observed and 0 otherwise. Without additional assumptions, is not identifiable. One common identifying assumption is missing completely at random (MCAR) in the sense of Rubin (1976), namely that R and X are independent. Alternatively, a selection model could be specified for the conditional distribution of R given X, or a pattern mixture model for the conditional distribution of X given R; see, for example, E-mail address: zhiwei.zhang@fda.hhs.gov. 1 The views expressed in this article do not necessarily represent those of the U.S. Food and Drug Administration. 0378-3758/$ - see front matter Published by Elsevier B.V doi:10.1016/j.jspi.2007.08.009