T. Washio et al. (Eds.): PAKDD 2008, LNAI 5012, pp. 767–776, 2008. © Springer-Verlag Berlin Heidelberg 2008 Concept Lattice–Based Mutation Control for Reactive Motifs Discovery Kitsana Waiyamai, Peera Liewlom, Thanapat Kangkachit, and Thanawin Rakthanmanon Data Analysis and Knowledge Discovery Laboratory (DAKDL), Computer Engineering Department, Engineering Faculty, Kasetsart University, Bangkok, Thailand {kitsana.w,oprll,fengtpk,fengtwr}@ku.ac.th Abstract. We propose a method for automatically discovering reactive motifs, which are motifs discovered from binding and catalytic sites, which incorporate information at binding and catalytic sites with bio-chemical knowledge. We in- troduce the concept of mutation control that uses amino acid substitution groups and conserved regions to generate complete amino acid substitution groups. Mutation control operations are described and formalized using a concept lat- tice representation. We show that a concept lattice is efficient for both represen- tations of bio-chemical knowledge and computational support for mutation control operations. Experiments using a C4.5 learning algorithm with reactive motifs as features predict enzyme function with 72% accuracy compared with 67% accuracy using expert-constructed motifs. This suggests that automatically generating reactive motifs are a viable alternative to the time-consuming proc- ess of expert-based motifs for enzyme function prediction. Keywords: mutation control, concept lattice, sequence motif, reactive motif, enzyme function prediction, binding site, catalytic site. 1 Introduction There are many statistic-based motif methods for enzyme function prediction capable of high accuracy; however, most of these methods [2,3,4,5] avoid the direct usage of motifs generated from binding and catalytic sites to predict enzyme function predic- tion. These methods use other resources from surrounding sites that contain very few sequences of binding and catalytic sites. In certain applications, it is necessary to understand how motifs of binding and catalytic sites are combined in order to perform enzyme function prediction. This is a reason why the statistic-based motifs cannot completely replace expert-identified motifs. In this paper, we develop a method to predict enzyme functions based on direct usage of binding and catalytic sites. Motifs discovered from binding and catalytic sites are called reactive motifs. The principal motivation is that different enzymes with the same reaction mechanism at binding and catalytic sites frequently perform the same enzyme function. In previous work [16], we introduced a unique process to discover reactive motifs using block scan filtering, Mutation Control, and Reactive Site-Group Definition. The main step in reactive