ORIGINAL INVESTIGATION The advantages of dense marker sets for linkage analysis with very large families Russell Thomson Æ Stephen Quinn Æ James McKay Æ Jeremy Silver Æ Melanie Bahlo Æ Liesel FitzGerald Æ Simon Foote Æ Jo Dickinson Æ Jim Stankovich Received: 8 November 2006 / Accepted: 2 January 2007 / Published online: 25 January 2007 Ó Springer-Verlag 2007 Abstract Dense sets of hundreds of thousands of markers have been developed for genome-wide asso- ciation studies. These marker sets are also beneficial for linkage analysis of large, deep pedigrees containing distantly related cases. It is impossible to analyse jointly all genotypes in large pedigrees using the Lander–Green Algorithm, however, as marker density increases it becomes less crucial to analyse all indi- viduals’ genotypes simultaneously. In this report, an approximate multipoint non-parametric technique is described, where large pedigrees are split into many small pedigrees, each containing just two cases. This technique is demonstrated, using phased data from the International Hapmap Project to simulate sets of 10,000, 50,000 and 250,000 markers, showing that it becomes increasingly accurate as more markers are genotyped. This method allows routine linkage analysis of large families with dense marker sets and represents a more easily applied alternative to Monte Carlo Markov Chain methods. Introduction Association studies are more efficient than linkage studies for detecting common alleles that confer sus- ceptibility to complex diseases (Risch and Merikangas 1996), requiring fewer case samples than linkage based designs such as sibpair studies. However, to detect rare dominant susceptibility variants, linkage analysis with large pedigrees remains the method of choice (Risch 2001). In fact it has been argued that ‘‘well-powered linkage studies ... should be conducted in advance of, or in conjunction with, a genome-wide association study’’ (Hirschhorn and Daly 2005). Hundreds of thousands or markers are required for genome-wide association studies, because linkage dis- equilibrium (LD) typically only extends for 0.01– 0.02 cM in large human populations (Altshuler et al. 2005). Conversely, linkage signals within families gen- erally extend over much longer genetic distances, and it is generally believed that there is little benefit in using a SNP set of more than 10,000 markers (Chen and Abecasis 2006; Evans and Cardon 2004). Linkage analysis with large families represents an intermediate situation. For dominant genes of incom- plete penetrance there are often many meioses sepa- rating cases. Furthermore, for diseases of late onset, it is difficult to recruit siblings and parents of cases to facilitate haplotype construction. To gain maximum inheritance information from such cases it can be beneficial to use higher density marker sets (Vierimaa et al. 2006). To map rare recessive variants, pedigrees are also used (explicitly or implicitly) in homozygosity mapping studies, where dense marker sets have proved similarly beneficial (Chiang et al. 2006). In this paper we show that marker sets of more than 10,000 markers R. Thomson (&) S. Quinn J. McKay L. FitzGerald S. Foote J. Dickinson J. Stankovich Menzies Research Institute, University of Tasmania, Private Bag 23, Hobart, TAS 7001, Australia e-mail: Russell.Thomson@utas.edu.au J. Silver M. Bahlo J. Stankovich The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia J. McKay International Agency for Research on Cancer, Lyon, France 123 Hum Genet (2007) 121:459–468 DOI 10.1007/s00439-007-0323-5