© 2008 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 63
Biotechnol. J. 2008, 3, 63–73 DOI 10.1002/biot.200700202 www.biotechnology-journal.com
1 Introduction
Commonly, interactions exist between residues in
proteins in 3-D space [1]. These residues should be
preserved during directed evolution (DE) to in-
crease the number of viable variants. Interacting
residues can be identified by applying the concept
of feature selection in machine learning to the data
that are generated during DE experiments [2–7].
Strategies can then be implemented during recom-
bination to avoid disrupting the interactions. Here
we report on the identification and verification
through mutagenesis of interacting residues in
monomeric red fluorescent protein (mRFP) and
Discosoma red fluorescent protein (DsRed), using
positive and negative results from only 83 variants.
We also mention the use of template engineering
strategies based on knowledge of interacting
residues to increase the fraction of active variants
in the library.
Research Article
Identifying interacting residues using Boolean Learning and
Support Vector Machines: Case study on mRFP and DsRed
proteins
Bernard L.W. Loo
1,3
*
, Anshul Dubey
1
*, Matthew J. Realff
1
, Jay H. Lee
1
and Andreas S. Bommarius
1,2,3
1
School of Chemical and Biomolecular Engineering, Atlanta, GA, USA
2
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
3
Parker H. Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, USA
In a protein, interactions exist between amino acid residues that influence the protein’s structur-
al integrity or stability and thus affect its catalytic function. The loss of this interaction due to mu-
tations in these amino acids usually leads to a non-functional protein. Probing the sequence space
of a protein through mutations or recombinations, as performed in directed evolution to search
for an improved variant, frequently results in such inactive sequences. In this work, we demon-
strate the use of machine learning to identify such interacting residues and the use of template en-
gineering strategies to increase the fraction of active variants in a library. We show that using the
sequences from recombination of monomeric red fluorescent protein (mRFP) and Discosoma red
fluorescent protein (DsRed), we were able to identify a pair of interacting residues using an algo-
rithm based on Boolean Learning and Support Vector Machines. The interaction between the iden-
tified residues was verified through point mutations on the mRFP and DsRed genes. We also show
that it is possible to use such results to alter the parental genes such that the probability of dis-
rupting the important interactions is minimized. This will result in a larger fraction of active vari-
ants in the recombinant library and allow us to access more functional space. We demonstrate this
effect by comparing the recombinant library of wild-type (WT) DsRed, mRFP and an altered se-
quence of DsRed with mRFP WT genes.
Keywords: Boolean Learning Support Vector Machines · DsRed · Fluorescent proteins · Interacting position · mRFP
Correspondence: Professor Andreas S. Bommarius, School of Chemical
and Biomolecular Engineering, 311 Ferst Drive, Atlanta, GA 30332, USA
E-mail: andreas.bommarius@chbe.gatech.edu
Abbreviations: BLSVM, Boolean Learning and Support Vector Machines;
DE, directed evolution; DsRed/mRFP, Discosoma/monomeric red fluores-
cent protein *These authors contributed equally to this manuscript.
Received 27 August 2007
Revised 20 September 2007
Accepted 23 September 2007
Supporting information
available online