Citation: Conti, P.L.; Mecatti, F.
Resampling under Complex
Sampling Designs: Roots,
Development and the Way Forward.
Stats 2022, 5, 258–269. https://
doi.org/10.3390/stats5010016
Academic Editor: Wei Zhu
Received: 27 January 2022
Accepted: 1 March 2022
Published: 8 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Review
Resampling under Complex Sampling Designs: Roots,
Development and the Way Forward
Pier Luigi Conti
1,
* and Fulvia Mecatti
2
1
Dipartimento di Scienze Statistiche, Sapienza Università di Roma, P.le Aldo Moro, 5, 00185 Roma, Italy
2
Dipartimento di Sociologia e Ricerca Sociale, Università di Milano-Bicocca, Via Bicocca Degli Arcimboldi, 8,
20126 Milano, Italy; fulvia.mecatti@unimib.it
* Correspondence: pierluigi.conti@uniroma1.it
Abstract: In the present paper, resampling for finite populations under an iid sampling design
is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its
properties. A principled appraisal of the main theoretical foundations and results is given and
discussed, together with important computational aspects. Finally, a discussion on open problems
and research perspectives is provided.
Keywords: resampling; bootstrap; pseudo-population; asymptotics; empirical processes
1. Introduction
1.1. Generalities
Resampling methods have a long and honorable history, going back at least to the
seminal paper by [1]. Survey data are an ideal context to use resampling methods to
approximate the sampling distribution of statistics, due to both (i) a generally large sample
size and (ii) data of typically good quality.
The present paper does not aim at providing a complete review of resampling methods
in sampling statistics; the interested reader is referred, for instance, to [2]. We mainly focus
on a special class of resampling methods—namely those based on pseudo-populations.
There are several reasons to support this restriction. First of all, they may be viewed, in
many respects, as the “natural” extension of classical Efron’s bootstrap to sampling finite
populations, in both descriptive and analytic inference (i.e., inference on finite population
and superpopulation parameters, respectively).
In the second place, in our knowledge, they are the only methods with a rigorous
asymptotic justification in terms of weak convergence of empirical processes, allowing
results not only for linear estimators but also for non-linear ones (under suitable
differentiability conditions).
In extreme synthesis, virtually all resampling methodologies used in sampling from
finite populations are based on the idea of accounting for the effect of the sampling design.
As it will be seen in the sequel, the main effect of the sampling design is that data cannot
be generally assumed independent and identically distributed (i.i.d.). A large portion of
the literature on resampling from finite populations focuses on estimating the variance of
estimators. The main approaches are essentially the ad hoc approach and plug in approach.
The basic idea of the ad hoc approach consists in maintaining Efron’s bootstrap as a
resampling procedure but in properly rescaling data in order to account for the dependence
among units. This approach is used, among others, in [3,4], where the re-sampled data
produced by the “usual” i.i.d. bootstrap are properly rescaled, as well as in [5,6]; cfr. also
the review in [7]. In [8] a “rescaled bootstrap process” based on asymptotic arguments
is proposed. Among the ad hoc approaches, we also classify [9] (based on a rescaling of
weights) and the “direct bootstrap” by [10].
Stats 2022, 5, 258–269. https://doi.org/10.3390/stats5010016 https://www.mdpi.com/journal/stats