T. Washio et al. (Eds.): PAKDD 2008, LNAI 5012, pp. 767–776, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Concept Lattice–Based Mutation Control
for Reactive Motifs Discovery
Kitsana Waiyamai, Peera Liewlom, Thanapat Kangkachit,
and Thanawin Rakthanmanon
Data Analysis and Knowledge Discovery Laboratory (DAKDL), Computer Engineering
Department, Engineering Faculty, Kasetsart University, Bangkok, Thailand
{kitsana.w,oprll,fengtpk,fengtwr}@ku.ac.th
Abstract. We propose a method for automatically discovering reactive motifs,
which are motifs discovered from binding and catalytic sites, which incorporate
information at binding and catalytic sites with bio-chemical knowledge. We in-
troduce the concept of mutation control that uses amino acid substitution groups
and conserved regions to generate complete amino acid substitution groups.
Mutation control operations are described and formalized using a concept lat-
tice representation. We show that a concept lattice is efficient for both represen-
tations of bio-chemical knowledge and computational support for mutation
control operations. Experiments using a C4.5 learning algorithm with reactive
motifs as features predict enzyme function with 72% accuracy compared with
67% accuracy using expert-constructed motifs. This suggests that automatically
generating reactive motifs are a viable alternative to the time-consuming proc-
ess of expert-based motifs for enzyme function prediction.
Keywords: mutation control, concept lattice, sequence motif, reactive motif,
enzyme function prediction, binding site, catalytic site.
1 Introduction
There are many statistic-based motif methods for enzyme function prediction capable
of high accuracy; however, most of these methods [2,3,4,5] avoid the direct usage of
motifs generated from binding and catalytic sites to predict enzyme function predic-
tion. These methods use other resources from surrounding sites that contain very few
sequences of binding and catalytic sites. In certain applications, it is necessary to
understand how motifs of binding and catalytic sites are combined in order to perform
enzyme function prediction. This is a reason why the statistic-based motifs cannot
completely replace expert-identified motifs. In this paper, we develop a method to
predict enzyme functions based on direct usage of binding and catalytic sites. Motifs
discovered from binding and catalytic sites are called reactive motifs. The principal
motivation is that different enzymes with the same reaction mechanism at binding and
catalytic sites frequently perform the same enzyme function. In previous work [16],
we introduced a unique process to discover reactive motifs using block scan filtering,
Mutation Control, and Reactive Site-Group Definition. The main step in reactive