International Journal of Epidemiology O International Epidemlological Association 1994 Vol. 23, No. 6 Printed In Great Britain Letter to the Editor Limitations to the Universal use of Capture-Recapture Methods From JEAN-CLAUDE DESENCLOS AND BRUNO HUBERT Sir—McCarty DJ et a/. 1 discuss the potential use of capture-recapture methodology in epidemiology and propose, through this method, a universal approach to correct incidence and prevalence rates derived from registries or surveillance systems for underascertainment. Indeed, underascertainment is a common deficiency of public health surveillance. However, if underascertain- ment does not vary over time and by case characteristics (that is, a fair representativeness is assumed), it is not considered as a major problem when the interest is to monitor trends; 2 the problem, however, is how to evaluate representativeness. Capture-recapture is a powerful method to provide, from two or several independent surveillance sources, an estimate of the actual number of cases in the com- munity, and to quantify the sensitivity (i.e. complete- ness of ascertainment) of the surveillance systems. In addition capture-recapture techniques can provide an assessment of representativeness: if information on case characteristics is available in both systems (date of onset, place of residence, age, sex, ethnicity, risk fac- tors...), a stratified analysis according to any of these variables allows the calculation of the level of under- ascertainment (i.e. sensitivity) in each strata for each surveillance system. If sensitivity varies across strata for any or both data sources, their representativeness cannot be assumed. 3 To use capture-recapture methods appropriately, several conditions need to be met: (i) the two systems should be independent (ii) all true matches and only matches should be identified (iii) all cases identified by the two or more surveillance systems should be true cases that occurred in the population under investi- gation and within the appropriate time period, As McCarty et al. reported, the evaluation of inde- pendence is possible through the use of the Bernoulli census method or of log-linear models when at least three systems are available. This situation with at least three systems that each have the same identifier in Rdseau National de Santd Publique, 14 rue do Val d'Osne, 94415 Sainte-Maurice Cedex, France. common for each case does not often occur, at least for surveillance systems. For two data sources, there is, however, an alternative (proposed by Sekar and Deming 3 ) based on the stratification of capture- recapture data by a third variable (e.g. district, county, age...). 4 For each strata the estimate of the actual number of cases and of the degree of completeness for both systems is derived using the capture-recapture formulas. A correlation coefficient between the com- pleteness of reporting of both systems weighted by the number of cases estimated in each strata is then calcu- lated. Independence is assumed if the correlation coefficient does not differ from zero; if it differs from zero, dependence of the two systems is then likely. Another approach that we proposed, 4 is to compare the sum of strata estimates to the crude unstratified esti- mate. If these estimates differ (i.e. the stratified estimate falls outside the 95% confidence interval of the crude estimate) dependence of both systems is likely. This latter approach gives results similar to the weighted correlation coefficient. Other issues are important to apply capture-recapture methodology appropriately and need more discussion. First, one needs at least two sources of information with individual case reporting and a unique personal iden- tifier for each case. In many national surveillance systems these conditions are not met, either because data are aggregated (in developing countries and also to some extent in Europe) or because data on individual cases are often reported without a unique personal identifier. 5 When two or more systems with a unique personal identifier exist in one country, their matching may be difficult and sometimes not possible for ethical or legal reasons. If no personal identifier is available, matches be- tween two or more systems can be defined according to a set of covariates collected in both systems (date of birth, place of residence, date of onset, sex...). The definition of matches is then a key issue: a very specific definition will yield few matches and overestimates the true number of cases; on the contrary a very sensitive and poorly specific definition of matches will lead to underestimation. 4 1322