Proceedings of Statistics Canada Symposium 2003 Challenges in Survey Taking for the Next Decade VARIANCE ESTIMATION IN TWO-PHASE SAMPLING M.A. Hidiroglou and J.N.K. Rao 1 ABSTRACT Two-phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables x is much smaller than the cost per unit of measuring the characteristic of interest. In the first-phase, a large sample 1 s is drawn according to a specific sampling design ( 29 1 s p and x is observed for the units 1 s i ∈ . Given the first- phase sample 1 s , a second-phase sample 2 s is selected from 1 s according to a specified sampling design ( 29 { } 1 2 | s s p and (y, x) is observed for the units 2 s i ∈ . In some cases, the population totals of some components of x may also be known. Two phase sampling is used for stratification at the second phase (Neyman, 1938; Rao, 1973) or both phases (Binder et al., 2000) and for regression estimation (Särndal et al., 1992, chapter 9; Hidiroglou and Särndal, 1998). Horvitz-Thompson (HT) type variance estimators are used for variance estimation. However, the HT variance estimator in uni-phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen-Yates-Grundy (SYG) variance estimator is relatively stable and nonnegative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the SYG variance estimators to two-phase sampling, assuming fixed first-phase sample size and fixed second phase sample size given the first-phase sample. We apply the new SYG variance estimators to two-phase sampling designs with stratification at the second phase or both phases. We also develop SYG type variance estimators of the two-phase regression estimators that make use of the first phase auxiliary data. KEYWORDS: Double-Expansion Estimator; Ratio-Estimator; Regression Estimator; Stratification. 1. INTRODUCTION Two-phase sampling is often used for estimating a population total or a mean when the cost per unit of collecting auxiliary data x is much smaller than the cost per unit of measuring the characteristics of interest y. The sampling scheme consists of two phases. In the first-phase, a large sample 1 s of size 1 n is drawn from the universe U according to a specified sampling design with probabilities { } ) ( 1 s p and x is observed for the sample units 1 s i ∈ . Given the first-phase sample 1 s , the second-phase sample 2 s is selected from 1 s according to a specified sampling design with conditional probabilities { } ) ( 1 2 s s p and ( 29 x , y is observed for the units 2 s i ∈ . In some cases, the population totals of some components 1 x of x may also be known. Neyman (1938) first proposed two-phase sampling for stratification. The first-phase sample 1 s , selected by simple random sampling, is stratified on the basis of a scalar auxiliary variable x observed on the units in the context of a first-phase simple random sample 1 s of size 1 n , 1 s i ∈ : U g g s s 1 1 = , where g s 1 is the first phase sample of random size, g n 1 , in stratum g. In the second-phase, simple random samples g s 2 of fixed sizes g n 2 are drawn from the first- phase samples g s 1 of random sizes g n 1 , 1 1 n n g g = ∑ . In the second phase, simple random samples g s 2 of fixed sizes g n 2 are drawn from the first-phase samples g s 1 independently. The assumption of fixed sizes g n 2 , however, is inconsistent with the sampling procedure because g n 2 is bounded above by the random variable g n 1 which varies from 0 to min ( 29 g N n , 1 , where g N is the number of population units in stratum g. Rao (1973) proposed an alternative 1 Mike Hidiroglou, Director of Survey Methods Division, Room D141, Methodology and Statistical Development Directorate, Cardiff Road, Newport, NP9-1XG, United Kingdom; J.N.K. Rao, School of Mathematics and Statistics, Ottawa, Ontario, K1S 5B6, Canada Statistics Canada International Symposium Series - Proceedings, 2003 ___________________________________________________________________________________________________________ Statistics Canada - Catalogue no. 11-522-XIE ________________________________________________________________________________________________________ 2