RESEARCH ARTICLE Open Access
Is EC class predictable from reaction mechanism?
Neetika Nath and John BO Mitchell
*
Abstract
Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical
reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random
Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in
descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and
also an external test set are used.
Results: The three descriptor sets encoding overall chemical transformation perform better than the two
descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases
and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall
reaction descriptors but not by mechanistic ones.
Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms.
Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making
them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably
mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in
various unrelated ways.
The performance of the different machine learning algorithms is in line with many cheminformatics applications,
with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information
plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction
is not a single problem; the challenge of predicting protein function from available sequence data is quite different
from assigning an EC classification from a cheminformatics representation of a reaction.
Background
Encoding enzyme reactions and mechanisms
Almost all biological processes proceed at a significant
rate only because of enzymes, proteins that catalyse the
chemical reactions found in nature. For half a century,
enzymes have been annotated using Enzyme Commis-
sion (EC) numbers [1]. The scheme is a hierarchical
organization of enzyme reactions into six main classes
(oxidoreductases, transferases, hydrolases, lyases, iso-
merases and ligases), which are then split at a further
three hierarchical levels. In general, these successive
levels describe the reaction at increasingly fine levels of
granularity. The six top level classes are very broad reac-
tion types. The second level subclass and third level
sub-subclass usually describe the specific bonds or
functional groups involved in the reaction. The fourth
level serial number defines the actual substrate and
therefore the specific chemical reaction catalysed. The
EC classification can be conveniently browsed and
searched via the ExplorEnz database [2,3], while the
official website maintained by the Nomenclature Com-
mittee of the International Union of Biochemistry and
Molecular Biology (NC-IUBMB) [4] is a valuable and
regularly updated resource. Numerous other online
databases allow the user to explore enzyme structure
and function, including the Enzyme Structures Database
[5], IntEnz [6], BRENDA [7] and KEGG [8,9].
Our motivation is to investigate the relationship
between the reaction mechanism as described in the
MACiE [10-13] (Mechanism, Annotation and Classifica-
tion in Enzymes) database and the main top-level class
of the EC classification. In order to do this, we generate
supervised machine learning models to predict EC class
from data on the chemical reaction or its mechanism.
* Correspondence: jbom@st-andrews.ac.uk
Biomedical Sciences Research Complex and EaStCHEM School of Chemistry,
Purdie Building, University of St Andrews, North Haugh, St Andrews,
Scotland KY16 9ST, UK
Nath and Mitchell BMC Bioinformatics 2012, 13:60
http://www.biomedcentral.com/1471-2105/13/60
© 2012 Nath and Mitchell; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.