Statistical methods for some simple disclosure limitation rules J. Pannekoek à Statistics Netherlands, Department of Statistical Methods, P.O. Box 959, 2270 AZ VOORBURG, The Netherlands To guard the con®dentiality of information provided by respondents, statistical oces apply disclosure limitation techniques. An often applied technique is to ensure that there are no categories for which the popula- tion frequency is presumed to be small (`rare' categories). This is attained by recoding, top-coding or setting values to `unknown'. Since popula- tion frequencies are usually not available, the decision that a category is rare is often based on intuitive considerations. This is a time consuming process, involving many decisions of the disclosure limitation practi- tioners. In this paper it will be explored to what extent the sample frequencies can be used to make such decisions. This leads to a pro- cedure which enables to automatically scan a data set for rare category combinations, whereby `rare' is de®ned by the disclosure limitation policy of the statistical oce. Key Words and Phrases: con®dentiality, recoding, top-coding. 1 Introduction A common concern of statistical oces that release microdata for use by external researchers is to diminish the risk of disclosure of information on individuals. It is generally accepted that it is insucient to discard directly identifying variables like names, addresses etc. only, because individuals may also be recognised on the basis of their values on other (indirectly) identifying variables such as a geographical indi- cator, profession, age and sex. If certain combinations of values of identifying vari- ables occur only once in the population, the associated individuals are unique with respect to these variables. If a researcher knows the values of the identifying variables for certain unique individuals this researcher can establish a link between the record and the individual it belongs to. Such a link leads to disclosure of the remaining information in the record, which was not known beforehand. See, for more detailed discussions of the disclosure problem, DUNCAN and LAMBERT (1989), BETHLEHEM et al. (1990), MOKKEN et al. (1992), SKINNER et al. (1994). #VVS, 1999. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA. jpnk@cbs.nl The views expressed in this paper are those of the author and do not necessarily re¯ect the policies of Statistics Netherlands. The author thanks Leon Willenborg for valuable suggestions and comments. Statistica Neerlandica (1999) Vol. 53, nr. 1, pp. 55±67 55