Replication Variance Estimation for Two-Phase Stratified Sampling Jae Kwang KIM, Alfredo NAVARRO, and Wayne A. FULLER In two-phase sampling, the second-phase sample is often a stratified sample based on the information observed in the first-phase sample. For the total of a population characteristic, either the double-expansion estimator or the reweighted expansion estimator can be used. Given a consistent first-phase replication variance estimator, we propose a consistent variance estimator that is applicable to both the double- expansion estimator and the reweighted expansion estimator. The proposed method can be extended to multiphase sampling. KEY WORDS: Double-expansion estimator; Double sampling; Multiphase sampling; Reweighted expansion estimator. 1. INTRODUCTION Two-phase sampling, also known as double sampling, can be a cost-effective technique in large-scale surveys. By selecting a large sample, observing cheap auxiliary variables, and prop- erly incorporating the auxiliary variables into the second-phase sampling design, we can produce estimators with smaller vari- ances than those based on a single-phase sampling design for the same cost. In one of the common procedures of two-phase sampling, the second-phase sample is selected using stratified sampling, where the strata are created on the basis of the first- phase observations. Rao (1973) and Cochran (1977) gave formulas for variance estimation when the first phase is a simple random sample and the second phase is a stratified simple random sample. Kott (1990) derived a formula for variance estimation when the first phase is a stratified random sample and the second phase is a restratified simple random sample based on first-phase informa- tion. Rao and Shao (1992) proposed a jackknife variance esti- mation method in the context of hot-deck imputation where the response corresponds to a second phase with Poisson sampling in imputation cells. Yung and Rao (2000) extended the result of Rao and Shao to poststratification. Binder (1996) illustrated a “cookbook” approach for the two-phase ratio estimator. Binder, Babyak, Brodeur, Hidiroglou, and Jocelyn (2000) derived for- mulas for variance estimation for various estimators for two- phase restratified sampling. Fuller (1998) proposed a replicate variance estimation method for the two-phase regression esti- mator. Among the methods cited, only the methods of Rao and Shao (1992) and Fuller (1998) are replication methods. One advan- tage of the replication method for variance estimation is its con- venience for a multipurpose survey. That is, after we create the replication weights, we can directly apply the replication weights to estimate the variance for any variable. Let the finite population be of size N , indexed from 1 to N , and let the finite population be partitioned into G groups, which we call the second-phase strata. The information about which Jae Kwang Kim is Assistant Professor, Department of Applied Statistics, Yonsei University, Seoul 120-749, Korea (E-mail: kimj@yonsei.ac.kr). Alfredo Navarro is the ACS Branch Chief, Decennial Statistical Studies Division, Bu- reau of the Census, Washington, DC 20233 (E-mail: alfredo.navarro@census. gov). Wayne A. Fuller is Distinguished Professor Emeritus, Department of Sta- tistics, Iowa State University, Ames, IA 50011 (E-mail: waf@iastate.edu). This research was supported in part by cooperative agreement 13-3AEU-0-80064 be- tween Iowa State University, the U.S. National Agricultural Statistics Service, and the U.S. Bureau of the Census. Much of the research was conducted while the first author was a mathematical statistician at the U.S. Bureau of the Census. The authors thank the referees for comments and suggestions that improved the manuscript. group a unit belongs to is not obtained until the first-phase sam- ple has been observed. We consider the two-phase estimator in which the first-phase sample is used to define strata to be used for the second-phase sample. Let the parameter of interest be the population total Y = N i=1 y i , where y i is the study variable and N is assumed known. Suppose that we have a first-phase sample of size n. If we observe y i on every element of the sample, then an unbiased estimator of Y is ˆ Y 1 = iA 1 w i y i , (1) where w i =[Pr(i A 1 )] 1 and A 1 is the set of indices in the sample. Now, assume that instead of directly observing y i for i A 1 , we observe x i = (x i1 ,..., x iG ) (2) for all i A 1 , where x ig takes the value 1 if unit i belongs to the gth group and 0 otherwise. Assume that G g=1 x ig = 1. Let a subsample of total size r be selected from the first-phase sample and let A 2 be the set of indices for the second-phase sample. Let w i = Pr(i A 2 | i A 1 ) 1 . (3) Let n 1g = iA 1 x ig be the number of first-phase sample el- ements in group g and let r g = iA 2 x ig be the number of second-phase sample elements in group g. If the second-phase sample is selected by stratified simple random sampling with the groups as strata, then w i = r 1 g n 1g for unit i with x ig = 1. Given the described two-phase sample, an unbiased estimator for the total of Y is ˆ Y d = iA 2 α d,i y i , (4) where α d,i = w i w i . Kott and Stukel (1997) called the estimator in (4) the double-expansion estimator (DEE). Another important estimator for the total of Y is ˆ Y r = G g=1 iA 1 w i x ig iA 2 w i x ig y i iA 2 w i x ig , = iA 2 α r,i y i , (5) © 2006 American Statistical Association Journal of the American Statistical Association March 2006, Vol. 101, No. 473, Theory and Methods DOI 10.1198/016214505000000763 312