Replication Variance Estimation for
Two-Phase Stratified Sampling
Jae Kwang KIM, Alfredo NAVARRO, and Wayne A. FULLER
In two-phase sampling, the second-phase sample is often a stratified sample based on the information observed in the first-phase sample.
For the total of a population characteristic, either the double-expansion estimator or the reweighted expansion estimator can be used. Given
a consistent first-phase replication variance estimator, we propose a consistent variance estimator that is applicable to both the double-
expansion estimator and the reweighted expansion estimator. The proposed method can be extended to multiphase sampling.
KEY WORDS: Double-expansion estimator; Double sampling; Multiphase sampling; Reweighted expansion estimator.
1. INTRODUCTION
Two-phase sampling, also known as double sampling, can be
a cost-effective technique in large-scale surveys. By selecting
a large sample, observing cheap auxiliary variables, and prop-
erly incorporating the auxiliary variables into the second-phase
sampling design, we can produce estimators with smaller vari-
ances than those based on a single-phase sampling design for
the same cost. In one of the common procedures of two-phase
sampling, the second-phase sample is selected using stratified
sampling, where the strata are created on the basis of the first-
phase observations.
Rao (1973) and Cochran (1977) gave formulas for variance
estimation when the first phase is a simple random sample and
the second phase is a stratified simple random sample. Kott
(1990) derived a formula for variance estimation when the first
phase is a stratified random sample and the second phase is a
restratified simple random sample based on first-phase informa-
tion. Rao and Shao (1992) proposed a jackknife variance esti-
mation method in the context of hot-deck imputation where the
response corresponds to a second phase with Poisson sampling
in imputation cells. Yung and Rao (2000) extended the result of
Rao and Shao to poststratification. Binder (1996) illustrated a
“cookbook” approach for the two-phase ratio estimator. Binder,
Babyak, Brodeur, Hidiroglou, and Jocelyn (2000) derived for-
mulas for variance estimation for various estimators for two-
phase restratified sampling. Fuller (1998) proposed a replicate
variance estimation method for the two-phase regression esti-
mator.
Among the methods cited, only the methods of Rao and Shao
(1992) and Fuller (1998) are replication methods. One advan-
tage of the replication method for variance estimation is its con-
venience for a multipurpose survey. That is, after we create
the replication weights, we can directly apply the replication
weights to estimate the variance for any variable.
Let the finite population be of size N , indexed from 1 to N ,
and let the finite population be partitioned into G groups, which
we call the second-phase strata. The information about which
Jae Kwang Kim is Assistant Professor, Department of Applied Statistics,
Yonsei University, Seoul 120-749, Korea (E-mail: kimj@yonsei.ac.kr). Alfredo
Navarro is the ACS Branch Chief, Decennial Statistical Studies Division, Bu-
reau of the Census, Washington, DC 20233 (E-mail: alfredo.navarro@census.
gov). Wayne A. Fuller is Distinguished Professor Emeritus, Department of Sta-
tistics, Iowa State University, Ames, IA 50011 (E-mail: waf@iastate.edu). This
research was supported in part by cooperative agreement 13-3AEU-0-80064 be-
tween Iowa State University, the U.S. National Agricultural Statistics Service,
and the U.S. Bureau of the Census. Much of the research was conducted while
the first author was a mathematical statistician at the U.S. Bureau of the Census.
The authors thank the referees for comments and suggestions that improved the
manuscript.
group a unit belongs to is not obtained until the first-phase sam-
ple has been observed.
We consider the two-phase estimator in which the first-phase
sample is used to define strata to be used for the second-phase
sample. Let the parameter of interest be the population total
Y =
∑
N
i=1
y
i
, where y
i
is the study variable and N is assumed
known. Suppose that we have a first-phase sample of size n. If
we observe y
i
on every element of the sample, then an unbiased
estimator of Y is
ˆ
Y
1
=
i∈A
1
w
i
y
i
, (1)
where w
i
=[Pr(i ∈ A
1
)]
−1
and A
1
is the set of indices in the
sample. Now, assume that instead of directly observing y
i
for
i ∈ A
1
, we observe
x
i
= (x
i1
,..., x
iG
) (2)
for all i ∈ A
1
, where x
ig
takes the value 1 if unit i belongs to the
gth group and 0 otherwise. Assume that
∑
G
g=1
x
ig
= 1.
Let a subsample of total size r be selected from the first-phase
sample and let A
2
be the set of indices for the second-phase
sample. Let
w
∗
i
=
Pr(i ∈ A
2
| i ∈ A
1
)
−1
. (3)
Let n
1g
=
∑
i∈A
1
x
ig
be the number of first-phase sample el-
ements in group g and let r
g
=
∑
i∈A
2
x
ig
be the number of
second-phase sample elements in group g. If the second-phase
sample is selected by stratified simple random sampling with
the groups as strata, then w
∗
i
= r
−1
g
n
1g
for unit i with x
ig
= 1.
Given the described two-phase sample, an unbiased estimator
for the total of Y is
ˆ
Y
d
=
i∈A
2
α
d,i
y
i
, (4)
where α
d,i
= w
i
w
∗
i
. Kott and Stukel (1997) called the estimator
in (4) the double-expansion estimator (DEE).
Another important estimator for the total of Y is
ˆ
Y
r
=
G
g=1
i∈A
1
w
i
x
ig
∑
i∈A
2
w
i
x
ig
y
i
∑
i∈A
2
w
i
x
ig
,
=
i∈A
2
α
r,i
y
i
, (5)
© 2006 American Statistical Association
Journal of the American Statistical Association
March 2006, Vol. 101, No. 473, Theory and Methods
DOI 10.1198/016214505000000763
312