© 2008 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 63 Biotechnol. J. 2008, 3, 63–73 DOI 10.1002/biot.200700202 www.biotechnology-journal.com 1 Introduction Commonly, interactions exist between residues in proteins in 3-D space [1]. These residues should be preserved during directed evolution (DE) to in- crease the number of viable variants. Interacting residues can be identified by applying the concept of feature selection in machine learning to the data that are generated during DE experiments [2–7]. Strategies can then be implemented during recom- bination to avoid disrupting the interactions. Here we report on the identification and verification through mutagenesis of interacting residues in monomeric red fluorescent protein (mRFP) and Discosoma red fluorescent protein (DsRed), using positive and negative results from only 83 variants. We also mention the use of template engineering strategies based on knowledge of interacting residues to increase the fraction of active variants in the library. Research Article Identifying interacting residues using Boolean Learning and Support Vector Machines: Case study on mRFP and DsRed proteins Bernard L.W. Loo 1,3 * , Anshul Dubey 1 *, Matthew J. Realff 1 , Jay H. Lee 1 and Andreas S. Bommarius 1,2,3 1 School of Chemical and Biomolecular Engineering, Atlanta, GA, USA 2 School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA 3 Parker H. Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, USA In a protein, interactions exist between amino acid residues that influence the protein’s structur- al integrity or stability and thus affect its catalytic function. The loss of this interaction due to mu- tations in these amino acids usually leads to a non-functional protein. Probing the sequence space of a protein through mutations or recombinations, as performed in directed evolution to search for an improved variant, frequently results in such inactive sequences. In this work, we demon- strate the use of machine learning to identify such interacting residues and the use of template en- gineering strategies to increase the fraction of active variants in a library. We show that using the sequences from recombination of monomeric red fluorescent protein (mRFP) and Discosoma red fluorescent protein (DsRed), we were able to identify a pair of interacting residues using an algo- rithm based on Boolean Learning and Support Vector Machines. The interaction between the iden- tified residues was verified through point mutations on the mRFP and DsRed genes. We also show that it is possible to use such results to alter the parental genes such that the probability of dis- rupting the important interactions is minimized. This will result in a larger fraction of active vari- ants in the recombinant library and allow us to access more functional space. We demonstrate this effect by comparing the recombinant library of wild-type (WT) DsRed, mRFP and an altered se- quence of DsRed with mRFP WT genes. Keywords: Boolean Learning Support Vector Machines · DsRed · Fluorescent proteins · Interacting position · mRFP Correspondence: Professor Andreas S. Bommarius, School of Chemical and Biomolecular Engineering, 311 Ferst Drive, Atlanta, GA 30332, USA E-mail: andreas.bommarius@chbe.gatech.edu Abbreviations: BLSVM, Boolean Learning and Support Vector Machines; DE, directed evolution; DsRed/mRFP, Discosoma/monomeric red fluores- cent protein *These authors contributed equally to this manuscript. Received 27 August 2007 Revised 20 September 2007 Accepted 23 September 2007 Supporting information available online