news & views
B
iocatalysis—that is, performing
chemical transformations with
biological catalysts—has made great
inroads over the last few years. Owing in
large part to their superior chemo-, regio-
and enantioselectivity and specifcity,
enzymes are increasingly used in low–
and high–value-added transformations
ranging from hydrolysis of cellulose to the
generation of chiral alcohols and amines for
pharma applications
1
. A signifcant fraction
of new chemical entities under development
in pharma feature chiral centers of amines,
but the enzymes needed to access these
molecules are currently rare. To address this
problem, new (R)-transaminases, rarer than
their (S)-specifc counterparts, were developed
by applying a sequence-based algorithm to
exclude sequences leading to known (S)-amine
and (R)- or (S)-amino acid specifcity
2
.
Nowadays, superior specifcity and
selectivity is ofen achieved through protein
engineering. Tis designing of protein
sequences has evolved rapidly. Site-directed
mutagenesis ushered in the frst generation
of protein engineering, rational design.
However, as protein design rules to this day
are not completely understood, success was
not always forthcoming with rational design.
In the second generation, combinatorial
protein engineering, ofen termed ‘directed
evolution’, was introduced and practiced
with good success, using protocols such
as DNA shufing and/or recombination-
dependent PCR
3,4
. However, owing to
the large protein sequence space, library
sizes quickly explode in hyperexponential
fashion with rising n. Given that hits—that
is, protein variants signifcantly improved
over background—are rare, large libraries
are necessary in combinatorial protein
engineering to improve chances for a hit.
Te goal of the third generation of protein
engineering, which is data driven, is to
shrink the size of libraries (producing
‘focused libraries’) while increasing
the chances for a hit
5–7
. Te article on
page 807 of this issue is a prime example of
data-driven protein engineering combined
with probing naturally existing diversity
2
.
Te strategy in this work had four
steps (Fig. 1): (i) to evaluate related enzyme
amino acid sequences to fnd sequence
patterns in existing (R)- and (S)-amino acid
and amine transaminases that could point
to unknown transaminases; (ii) to predict
relevant positions and positions that would
need to be varied in the target sequence;
(iii) to develop an annotation algorithm
for sequence motifs to exclude unwanted
activities; and (iv) to identify sequences
using the annotation algorithm, develop
the corresponding proteins and test them
for function.
From the crystal structures of (S)-specifc
α-amino acid transaminases (α-TAs)
and branched-chain aminotransferases
(BCAT) and amino acid alignment, the
authors discovered that the presence of
a hydrophobic residue in position 95
(phenylalanine, not tyrosine) and absence
of lysine or arginine in position 40 indicated
specifcity toward amines rather than amino
acids. Next, the residues 107–109 in contact
with the substrate in the active site were
found to be rather conserved in (S)-specifc
BCAT and (R)-specifc DATA (d-amino
acid amino transferase) sequences of proven
functionality. An algorithm was developed
to exclude the sequences shown in either,
with the argument that the remaining
sequences should have (R)-specifc amine
specifcity. Lastly, almost 6,000 sequences
annotated as BCAT or class IV pyridoxal-
5′-phosphate–dependent proteins from
the National Center for Biotechnology
Information database were analyzed, and
21 sequences were identifed that met all
the criteria specifed above. Ten of those 21
(48%) were found to have signifcant levels
of (R)-transaminase activity. Tis percentage
is almost identical to the one for small
libraries (~20 variants) aimed at thermal
stabilization of proteins according to another
concept using both crystal structures and
sequence-alignment—that is, structure-
guided consensus
8
.
Te results presented here demonstrate
that developing sequences already extant
in nature but so far merely annotated can
provide a more targeted, faster path to
new activity or specifcity than directed
evolution. With ever more genomes being
sequenced, the number of annotated but
undeveloped sequences keeps rising rapidly.
Te two key steps are (i) picking key residues
related to the desired activity or specifcity
and (ii) fltering out nonpertinent annotated
sequences on the basis of their amino acid
fngerprints with clever algorithms. Tis
procedure requires a certain number of
existing, successfully characterized examples
(for (i)) and sufciently many functionally
proven, annotated sequences of analogs
PROTEIN ENGINEERING
Check nature first, then evolve
Ten significantly active new (R)-transaminases, still very rare enzymes, were found among 21 designed variants
obtained from nothing more than existing transaminase structures and alignment of pertinent fingerprints of
annotated sequences.
Andreas S Bommarius
Figure 1 | Evaluation of active site environment
and of fingerprints in multiple sequence
alignments of transaminases. Filtering of known
or undesired sequences led to 21 annotated
sequences, of which 10 turned out to be
(R)-transaminases with enantiomeric purity
up to 99.6% enantiomeric excess.
Structures
Sequences
Enzyme with desired specificity
Fingerprints
FxxxY
Gx(UR)
Katherine Vicari
NATURE CHEMICAL BIOLOGY | VOL 6 | NOVEMBER 2010 | www.nature.com/naturechemicalbiology 793
© 2010 Nature America, Inc. All rights reserved.