M. Chetty, A. Ngom, and S. Ahmad (Eds.): PRIB 2008, LNBI 5265, pp. 412–423, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Discovery of Biomarkers for Hexachlorobenzene Toxicity
Using Population Based Methods on Gene Expression
Data
Cem Meydan
1
, Alper Küçükural
2
, Deniz Yörükoğlu
3
, and O. Uğur Sezerman
4
Biological Sciences and Bioengineering, Sabanci University
Sabancı Üniversitesi, Orhanlı-Tuzla, 34956 İstanbul, Türkiye
Tel.: + (90)2164839000; Fax: + (90)2164839550
{cemmeydan,kucukural,denizy}@su.sabanciuniv.edu,
ugur@sabanciuniv.edu
Abstract. Discovering toxicity biomarkers is important in drug discovery to
safely evaluate possible toxic effects of a substance in early phases. We tried
evolutionary classification methods for selecting the important classifier genes
in hexachlorobenzene toxicity using microarray data. Using modified genetic
algorithms for selection of minimum number of features for classification of
gene expression data, we discovered a number of gene sets of size 4 that were
able to discriminate between the control and the hexachlorobenzene (HCB) ex-
posed group of Brown-Norway rats with >99% accuracy in 5-fold cross-
validation tests, whereas classification using all of the genes with SVM and
other methods yielded results that vary between 48.48% to 81.81%. Making use
of this small number of genes as biomarkers may allow us to detect toxicity of
substances with mechanisms of toxicity similar to HCB in a fast and cost effi-
cient manner when there are no emerging symptoms.
Keywords: Feature selection, toxicogenomics, genetic algorithms, biomarker
discovery.
1 Introduction
Finding reliable toxicity biomarkers is important in toxicogenomics to safely evaluate
possible toxic effects of a substance in early phases of drug discovery. Discovering the
important mechanisms of toxicity for known toxic substances and developing bio-
markers that detect these can lead to classification of new substances with respect to
their toxicity in a cost-efficient manner.
Using microarray technology to evaluate the changes in gene expression data be-
tween control and experiment data sets, the significant set of genes that indicate the
existence of the toxicity may be obtained. Discovering these genes that are correlated
with the substance class may point the mechanisms of toxicity and the effected path-
ways. These sets of genes may also be used for development of diagnostic kits that can
be used to detect possible existence of toxicity of a substance on the test subjects, or on
the early diagnosis of toxin exposure.