M. Chetty, A. Ngom, and S. Ahmad (Eds.): PRIB 2008, LNBI 5265, pp. 412–423, 2008. © Springer-Verlag Berlin Heidelberg 2008 Discovery of Biomarkers for Hexachlorobenzene Toxicity Using Population Based Methods on Gene Expression Data Cem Meydan 1 , Alper Küçükural 2 , Deniz Yörükoğlu 3 , and O. Uğur Sezerman 4 Biological Sciences and Bioengineering, Sabanci University Sabancı Üniversitesi, Orhanlı-Tuzla, 34956 İstanbul, Türkiye Tel.: + (90)2164839000; Fax: + (90)2164839550 {cemmeydan,kucukural,denizy}@su.sabanciuniv.edu, ugur@sabanciuniv.edu Abstract. Discovering toxicity biomarkers is important in drug discovery to safely evaluate possible toxic effects of a substance in early phases. We tried evolutionary classification methods for selecting the important classifier genes in hexachlorobenzene toxicity using microarray data. Using modified genetic algorithms for selection of minimum number of features for classification of gene expression data, we discovered a number of gene sets of size 4 that were able to discriminate between the control and the hexachlorobenzene (HCB) ex- posed group of Brown-Norway rats with >99% accuracy in 5-fold cross- validation tests, whereas classification using all of the genes with SVM and other methods yielded results that vary between 48.48% to 81.81%. Making use of this small number of genes as biomarkers may allow us to detect toxicity of substances with mechanisms of toxicity similar to HCB in a fast and cost effi- cient manner when there are no emerging symptoms. Keywords: Feature selection, toxicogenomics, genetic algorithms, biomarker discovery. 1 Introduction Finding reliable toxicity biomarkers is important in toxicogenomics to safely evaluate possible toxic effects of a substance in early phases of drug discovery. Discovering the important mechanisms of toxicity for known toxic substances and developing bio- markers that detect these can lead to classification of new substances with respect to their toxicity in a cost-efficient manner. Using microarray technology to evaluate the changes in gene expression data be- tween control and experiment data sets, the significant set of genes that indicate the existence of the toxicity may be obtained. Discovering these genes that are correlated with the substance class may point the mechanisms of toxicity and the effected path- ways. These sets of genes may also be used for development of diagnostic kits that can be used to detect possible existence of toxicity of a substance on the test subjects, or on the early diagnosis of toxin exposure.