26 August 2008 © 2008 The Biochemical Society Regulars Biochemical Journal Classic Papers Sorting the diverse Carbohydrates offer a structural and chemical diversity unrivalled in Nature: two glucose residues can be joined together in 30 different ways, and, with six different sugars, the number of possible isomers exceeds 1012 [1]. This huge diversity is reflected in the diverse roles for carbohydrates in Nature. Mono‑, di‑, oligo‑ and poly‑saccharides and glycoconjugates play myriad roles in biology, in addition to well‑known ones such as energy storage (starch, glycogen) and maintenance of structure (cellulose, chitin, alginate). The diversity of what is sometimes called the ‘glycome’ also provides for a subtle means of cellular communication in higher organisms: carbohydrates are the language of the cell. Sugar‑mediated interactions not only are important for the communication of healthy cells, but also play crucial roles in disease, viral invasion and bacterial attack and malignancy. Sharon [2] has termed the challenge of carbohydrates as “the last frontier of molecular and cell biology”. There is thus considerable interest in the enzymes whose job it is to modify and cleave carbohydrates [GHs (glycoside hydrolases) and lyases] and those involved in their biosynthesis, GTs (glycosyltransferases). Typically, these enzymes make up approx. 1–2% of the genome of any organism [3]. Thus, at the time of writing, there are around 70000 ORFs (open reading frames) known which potentially encode GHs or GTs. A major goal for the scientific community is to extract useful informa‑ tion on the enzymes encoded by these ORFs from sequence alone. This is an enormous challenge, one complicated by the modular nature of the enzymes themselves [4]. Gideon J. Davies (University of York) and Michael L. Sinnott (University of Huddersfield) In the 1990s, Bernard Henrissat (Figure 1) initiated a sequence‑ based classification of carbohydrate‑active enzymes that now un‑ derpins all functional, structural and mechanistic consideration of these proteins. His first classic Biochemical Journal paper, ‘A classifi‑ cation of glycosyl hydrolases based on amino acid sequence similari‑ ties’ [5] was built largely on the unusual and challenging technique of HCA (hydrophobic cluster analysis) [6] (described below), and defined the first 35 sequence‑based families of GHs (termed families GH1–GH35), the enzymes involved in the hydrolysis of the glyco‑ sidic bond in di‑, oligo‑ and poly‑saccharides and glycoconjugates. e second classic Biochemical Journal paper appeared in 1993 [7], when a further 181 GH sequences were analysed, and the number of GH families rose to 45. ere was a similar expansion in 1996 [8]. Subsequently, Bernard Henrissat and Pedro Coutinho have estab‑ lished a website with a continuously updated classification database (http://www.cazy.org); at the beginning of 2008, there were 112 GH families containing almost 40000 ORFs (Figure 2). Approx. 3000 pages are downloaded from the CAZy server daily, emphasizing the central position of this sequence classification in carbohydrate research today. Carbohydrates offer a diversity that far surpasses that available with proteins or nucleic acids. Henrissat was very quick to realize that the wealth of different substrates was more than matched by the plethora of enzymes responsible for their degradation. For example, even a compara‑ tively simple substrate such as cellulose, a regular polysaccharide of β‑1,4‑linked glucose, requires a complex enzymatic consortium for its complete degradation. Henrissat’s first paper on GH classifica‑ tion was inspired by this earlier work on cellulases [9]. In this initial study, Henrissat used HCA to define six distinct families of cellulases, termed families A–F. HCA itself was an unusual, perhaps confusing, technique, derided by some as “French Impressionism”, but, in the hands of an expert, it proved to be an amazingly power‑ ful tool for comparing sequences and hence for helping place distantly related enzymes into families. HCA is based on the principle of the ‘helical wheel’, that one face of an α‑helix is predominantly hydrophobic, and so, when the linear amino acid sequence is redrawn in two dimensions, with a helical Figure 1. Bernard Henrissat The sequence‑based classifications of carbohydrate‑active enzymes Downloaded from https://portlandpress.com/biochemist/article-pdf/30/4/26/4704/bio030040026.pdf by guest on 29 May 2020