26 August 2008 © 2008 The Biochemical Society
Regulars
Biochemical Journal Classic Papers
Sorting the diverse
Carbohydrates offer a structural and chemical diversity
unrivalled in Nature: two glucose residues can be joined
together in 30 different ways, and, with six different sugars,
the number of possible isomers exceeds 1012 [1]. This huge
diversity is reflected in the diverse roles for carbohydrates
in Nature. Mono‑, di‑, oligo‑ and poly‑saccharides and
glycoconjugates play myriad roles in biology, in addition to
well‑known ones such as energy storage (starch, glycogen)
and maintenance of structure (cellulose, chitin, alginate).
The diversity of what is sometimes called the ‘glycome’ also
provides for a subtle means of cellular communication in
higher organisms: carbohydrates are the language of the cell.
Sugar‑mediated interactions not only are important for the
communication of healthy cells, but also play crucial roles in
disease, viral invasion and bacterial attack and malignancy.
Sharon [2] has termed the challenge of carbohydrates as
“the last frontier of molecular and cell biology”. There is
thus considerable interest in the enzymes whose job it is to
modify and cleave carbohydrates [GHs (glycoside hydrolases)
and lyases] and those involved in their biosynthesis, GTs
(glycosyltransferases). Typically, these enzymes make up
approx. 1–2% of the genome of any organism [3]. Thus, at the
time of writing, there are around 70000 ORFs (open reading
frames) known which potentially encode GHs or GTs. A major
goal for the scientific community is to extract useful informa‑
tion on the enzymes encoded by these ORFs from sequence
alone. This is an enormous challenge, one complicated by
the modular nature of the enzymes themselves [4].
Gideon J. Davies
(University of York)
and Michael L.
Sinnott (University of
Huddersfield)
In the 1990s, Bernard Henrissat (Figure 1) initiated a sequence‑
based classification of carbohydrate‑active enzymes that now un‑
derpins all functional, structural and mechanistic consideration of
these proteins. His first classic Biochemical Journal paper, ‘A classifi‑
cation of glycosyl hydrolases based on amino acid sequence similari‑
ties’ [5] was built largely on the unusual and challenging technique
of HCA (hydrophobic cluster analysis) [6] (described below), and
defined the first 35 sequence‑based families of GHs (termed families
GH1–GH35), the enzymes involved in the hydrolysis of the glyco‑
sidic bond in di‑, oligo‑ and poly‑saccharides and glycoconjugates.
e second classic Biochemical Journal paper appeared in 1993 [7],
when a further 181 GH sequences were analysed, and the number of
GH families rose to 45. ere was a similar expansion in 1996 [8].
Subsequently, Bernard Henrissat and Pedro Coutinho have estab‑
lished a website with a continuously updated classification database
(http://www.cazy.org); at the beginning of 2008, there were 112 GH
families containing almost 40000 ORFs (Figure 2). Approx. 3000
pages are downloaded from the CAZy server
daily, emphasizing the central position of
this sequence classification in carbohydrate
research today.
Carbohydrates offer a diversity that
far surpasses that available with proteins
or nucleic acids. Henrissat was very quick
to realize that the wealth of different
substrates was more than matched by the
plethora of enzymes responsible for their
degradation. For example, even a compara‑
tively simple substrate such as cellulose,
a regular polysaccharide of β‑1,4‑linked
glucose, requires a complex enzymatic
consortium for its complete degradation.
Henrissat’s first paper on GH classifica‑
tion was inspired by this earlier work on
cellulases [9]. In this initial study, Henrissat
used HCA to define six distinct families
of cellulases, termed families A–F. HCA
itself was an unusual, perhaps confusing,
technique, derided by some as “French
Impressionism”, but, in the hands of an
expert, it proved to be an amazingly power‑
ful tool for comparing sequences and hence
for helping place distantly related enzymes
into families. HCA is based on the principle
of the ‘helical wheel’, that one face of an
α‑helix is predominantly hydrophobic, and
so, when the linear amino acid sequence is
redrawn in two dimensions, with a helical
Figure 1. Bernard Henrissat
The sequence‑based classifications of carbohydrate‑active enzymes
Downloaded from https://portlandpress.com/biochemist/article-pdf/30/4/26/4704/bio030040026.pdf by guest on 29 May 2020