26 August 2008 © 2008 The Biochemical Society Regulars Biochemical Journal Classic Papers Sorting the diverse Carbohydrates oﬀer a structural and chemical diversity unrivalled in Nature: two glucose residues can be joined together in 30 diﬀerent ways, and, with six diﬀerent sugars, the number of possible isomers exceeds 1012 [1]. This huge diversity is reﬂected in the diverse roles for carbohydrates in Nature. Mono‑, di‑, oligo‑ and poly‑saccharides and glycoconjugates play myriad roles in biology, in addition to well‑known ones such as energy storage (starch, glycogen) and maintenance of structure (cellulose, chitin, alginate). The diversity of what is sometimes called the ‘glycome’ also provides for a subtle means of cellular communication in higher organisms: carbohydrates are the language of the cell. Sugar‑mediated interactions not only are important for the communication of healthy cells, but also play crucial roles in disease, viral invasion and bacterial attack and malignancy. Sharon [2] has termed the challenge of carbohydrates as “the last frontier of molecular and cell biology”. There is thus considerable interest in the enzymes whose job it is to modify and cleave carbohydrates [GHs (glycoside hydrolases) and lyases] and those involved in their biosynthesis, GTs (glycosyltransferases). Typically, these enzymes make up approx. 1–2% of the genome of any organism [3]. Thus, at the time of writing, there are around 70000 ORFs (open reading frames) known which potentially encode GHs or GTs. A major goal for the scientiﬁc community is to extract useful informa‑ tion on the enzymes encoded by these ORFs from sequence alone. This is an enormous challenge, one complicated by the modular nature of the enzymes themselves [4]. Gideon J. Davies (University of York) and Michael L. Sinnott (University of Huddersfield) In the 1990s, Bernard Henrissat (Figure 1) initiated a sequence‑ based classiﬁcation of carbohydrate‑active enzymes that now un‑ derpins all functional, structural and mechanistic consideration of these proteins. His ﬁrst classic Biochemical Journal paper, ‘A classiﬁ‑ cation of glycosyl hydrolases based on amino acid sequence similari‑ ties’ [5] was built largely on the unusual and challenging technique of HCA (hydrophobic cluster analysis) [6] (described below), and deﬁned the ﬁrst 35 sequence‑based families of GHs (termed families GH1–GH35), the enzymes involved in the hydrolysis of the glyco‑ sidic bond in di‑, oligo‑ and poly‑saccharides and glycoconjugates. e second classic Biochemical Journal paper appeared in 1993 [7], when a further 181 GH sequences were analysed, and the number of GH families rose to 45. ere was a similar expansion in 1996 [8]. Subsequently, Bernard Henrissat and Pedro Coutinho have estab‑ lished a website with a continuously updated classiﬁcation database (http://www.cazy.org); at the beginning of 2008, there were 112 GH families containing almost 40000 ORFs (Figure 2). Approx. 3000 pages are downloaded from the CAZy server daily, emphasizing the central position of this sequence classiﬁcation in carbohydrate research today. Carbohydrates oﬀer a diversity that far surpasses that available with proteins or nucleic acids. Henrissat was very quick to realize that the wealth of diﬀerent substrates was more than matched by the plethora of enzymes responsible for their degradation. For example, even a compara‑ tively simple substrate such as cellulose, a regular polysaccharide of β‑1,4‑linked glucose, requires a complex enzymatic consortium for its complete degradation. Henrissat’s ﬁrst paper on GH classiﬁca‑ tion was inspired by this earlier work on cellulases [9]. In this initial study, Henrissat used HCA to deﬁne six distinct families of cellulases, termed families A–F. HCA itself was an unusual, perhaps confusing, technique, derided by some as “French Impressionism”, but, in the hands of an expert, it proved to be an amazingly power‑ ful tool for comparing sequences and hence for helping place distantly related enzymes into families. HCA is based on the principle of the ‘helical wheel’, that one face of an α‑helix is predominantly hydrophobic, and so, when the linear amino acid sequence is redrawn in two dimensions, with a helical Figure 1. Bernard Henrissat The sequence‑based classifications of carbohydrate‑active enzymes Downloaded from https://portlandpress.com/biochemist/article-pdf/30/4/26/4704/bio030040026.pdf by guest on 29 May 2020