Ten years of balanced sampling with the cube method: An appraisal Yves Tillé Abstract This paper presents a review and assessment of the use of balanced sampling by means of the cube method. After defining the notion of balanced sample and balanced sampling, a short history of the concept of balancing is presented. The theory of the cube method is briefly presented. Emphasis is placed on the practical problems posed by balanced sampling: the interest of the method with respect to other sampling methods and calibration, the field of application, the accuracy of balancing, the choice of auxiliary variables and ways to implement the method. Key Words: Sampling; Balancing; Horvitz-Thompson estimator. 1. Introduction While the idea of balanced sampling has been around since the early days of survey statistic development, ap- plying the concept has been difficult because almost all the proposed methods have either been enumerative or rejective and required considerable computation time. The algorithm of the cube method was proposed in 1998 by Deville and Tillé, and a first implementation was written by three students of the École Nationale de la Statistique et de l’Ana- lyse de l’Information of Rennes in France (see Bousabaa, Lieber and Sirolli 1999). Finally, the method was published in Tillé (2001) and Deville and Tillé (2004). Since this time, several implementations have been proposed and several survey managers have used the cube method to select samples, the most important applications being the New French Census and the French Master Sample. Our aim is to assess 10 years of development and use of balanced sampling in order to better ascertain when and how the cube method can be used to select samples of householders or establishments. After discussing the con- cept of balanced sample and balanced sampling in Section 2, we give a list of particular cases in Section 3. In Section 4, we briefly trace the history of this concept for both the model-based and design-based frameworks. Next, in Section 5, we provide a brief overview of the cube method, which is a class of algorithms that allows us to select randomly balanced samples with given inclusion proba- bilities (see Deville and Tillé 2004; Tillé 2001, 2006b). We try to present the main principles of this algorithm without giving a detailed description of the technicalities of the method. Section 6 is devoted to the principles of variance estimation in balanced sampling. Finally, in Sections 7, we discuss the interest of balanced sampling in practice and compare balanced sampling with other sampling methods and calibration. We also give a list of recent applications. This Section also deals with the accuracy of balancing, the choice of auxiliary variables and ways to implement bal- anced sampling. The paper ends with an exhaustive bibli- ographical list of references on balanced sampling and their applications. 2. Balanced sampling 2.1 Definition of a balanced sample Consider a sample s of size n that is a subset of a finite population U of size . N A sample is said to be balanced if, for a vector of auxiliary variable 1 =( , , , , ), k k kj kp x x x x 1 1 = , k k k S kU n N x x (1) which means that the sample means of the x-variables match their population means. Brewer (1999) drew a distinction between a balanced selection of samples and a random selection of samples. However, a balanced sample may be selected randomly. If a random sample S is selected randomly, then each unit of the population has an inclusion probability k of being selected. In this case, a random sample must satisfy the following balancing equations: = . k k k S kU k x x (2) In other words, in a balanced sample, the total of the x- variables are estimated without error. Several authors like Cumberland and Royall (1981) and Kott (1986) would call a sample that satisfies Equation (2) a ‘π-balanced sample’, as opposed to a ‘mean-balanced sample’ defined by Equation (1). Nevertheless, in this paper, we will consider that (1) is only a particular case of (2) that occurs when = / k nN or when the sample is not selected randomly. We refer to both cases as a balanced sample. Published in Survey Methodology 37, issue 2, 215-226, 2011 which should be used for any reference to this work 1 Yves Tillé, University of Neuchâtel, Pierre à Mazel 7, 2000 Neuchâtel Switzerland. E-mail : yves.tille@unine.ch. brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by RERO DOC Digital Library