Evaluation of an algorithm of tagging SNPs selection by
linkage disequilibrium
Nelson L.S. Tang
a,
⁎
, Paul D.P. Pharoah
b
, Suk Ling Ma
a
, Douglas F. Easton
c
a
Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong
b
Department of Oncology, Strangeways Research Laboratories, Cambridge, UK
c
Cancer Research U.K. Genetic Epidemiology Unit, Strangeways Research Laboratories, Cambridge, UK
Received 27 July 2005; received in revised form 30 October 2005; accepted 25 November 2005
Available online 19 January 2006
Abstract
Background: Single nucleotide polymorphisms (SNPs) are the most abundant kind of genetic polymorphism in the human genome. They are
important in both genetic research and genetic testing in a clinical setting, such as in the area of pharmacogenetics. In order to improve efficiency,
tagging SNPs (tagSNPs) are selected in genes of interest to represent other co-related SNPs in linkage disequilibrium (LD) with the tagSNPs.
Various algorithms have been proposed to identify a subset of single nucleotide polymorphisms as tagSNPs. Most algorithms of tagSNPs selection
are haplotype-based, in which the spatial relationship between SNPs is considered. Currently, a more efficient cluster-based algorithm is proposed
which clusters SNPs solely by a LD parameter, such as r
2
. Here, we evaluated the sample distribution of r
2
and its effect on the cluster-based
tagSNPs selection.
Design and methods: The genotype data of 198 individual within a 500-kb region on 5q31 was used to evaluate the sample distribution of r
2
and its effect on the cluster-based tagSNPs selection.
Results: It was found that the degree of variation of LD depends on the LD structure of genes.
Conclusion: As a cluster-based tagSNPs selection algorithm does not take into account the spatial position of SNPs, a more stringent r
2
threshold is required to achieve more reliable tagSNPs selection.
© 2005 The Canadian Society of Clinical Chemists. All rights reserved.
Keywords: Tagging; SNPs; Genetic; LD metric; Haplotype; Linkage disequilibrium
Introduction
Genetic factors play a strong role in susceptibility to
common diseases. Under the common variants–common
diseases hypothesis, genetic predisposition to common diseases
like diabetes and schizophrenia is due to genetic variants that
are prevalent in the general population [1,2]. Disease pheno-
types are the end-results of the action of multiple disease-
predisposing genetic variants, while each of them confers a
moderate increase in risk. In the past, familial linkage study was
the only feasible genetic mapping approach due to the
availability of sparse genetic markers in our genome. However,
it is very difficult, if not possible, to dissect the genetics of
common diseases by linkage analysis. After the completion of
the Human Genome Project in 2003 [3,4] and recently Phase 1
of the HapMap project [5], abundant genetic polymorphisms,
mostly single nucleotide polymorphisms (SNPs), are now
readily available for analysis. It also enables their clinical
application in diagnosis and risk prediction of disease. In
addition, genetic study of inter-individual difference in drugs is
also made possible using a similar approach in the field of
pharmacogenetics [6,7].
A genetic association study is used to identify the genetic
loci responsible for susceptibility to common diseases.
Determination of a representative subset of SNPs will enhance
the efficiency of genotyping in a genetic association study.
Furthermore, after the disease causative genes are identified,
such a set of representative SNPs is also needed for clinical
application. The set of SNPs that could be used to represent the
Clinical Biochemistry 39 (2006) 240 – 243
⁎
Corresponding author. Fax: +852 26322320/+852 26365090.
E-mail address: nelsontang@cuhk.edu.hk (N.L.S. Tang).
0009-9120/$ - see front matter © 2005 The Canadian Society of Clinical Chemists. All rights reserved.
doi:10.1016/j.clinbiochem.2005.11.014