AN APPLICATION OF THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE TO THE 1990 U.S. DECENNIAL CENSUS William E. Winkler and Yves Thibaudeau U.S. Bureau of the Census ABSTRACT This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known. Key words and phrases. EM Algorithm; String Comparator Metric; LP Algorithm; Decision Rule; Error Rate. 1. INTRODUCTION This paper describes a particular application of the Fellegi-Sunter (1969) model of record linkage. New computational methods are used for computer matching the Post Enumeration Survey (PES) with the Census. The PES is used to produce adjusted Census counts. Computer matching is the first stage of PES processing. All crucial matching parameters associated with comparisons of individual fields are computed automatically. The parameters are generally based on characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. Lookup tables that account for the relative frequency of occurrence of different strings are computed using the files being matched. The paper is divided into a number of sections. The second section consists of five parts. The first part gives background on the Fellegi-Sunter model. The second part describes PES and Census files from the 1988 Dress Rehearsal Census and overall matching procedures. Truth and falsehood of matches is known for the Dress Rehearsal files. The third part provides details of a modified Expectation-Maximization (EM) Algorithm for estimating probability distributions used in a crucial likelihood ratio (see e.g., Winkler 1988, 1989a; Thibaudeau 1989). In the fourth part, new computational methods for automatically creating frequency tables accounting for the relative distinguishing power of strings such as 'Smith' and 'Zabrinsky' are given. The methods are a special case of Winkler (1989b). The fifth part describes new string comparator metrics that allow comparison of strings that do not agree on a character-by-character basis. The metrics generalize Damerau-Levenstein and Jaro metrics (see e.g., Winkler 1985, 1989c, 1990b). Methods for modeling how the metric adjusts matching weights between pure agreement and pure disagreement are covered. A new linear sum assignment algorithm that forces one-to-one assignments is described in the