CHAPTER 1 COMPARING, RANKING AND FILTERING MOTIFS WITH CHARACTER CLASSES : APPLICATION TO BIOLOGICAL SEQUENCE ANALYSIS Matteo Comin and Davide Verzotto University of Padova, Italy 1.1 INTRODUCTION In biology the notion of motif plays a central role for describing various phenom- ena. For example, protein functional motifs, like the ones contained in the PROSITE database [20], e.g. [FY ]DPC [LIM ][ASG]C [ASG] , are in general represented as motifs with character classes. These motifs are collected using semi-automatic pro- cedures, nevertheless they are still manually verified. The discovery of sequence motifs in proteins and genes is becoming increasingly important [2, 29]. Such motifs usually correspond to residues conserved during evo- lution due to some significant structural or functional role. Moreover the increasing availability of biological sequences, such as whole genomes, from next-generation sequencing technologies to new protein discoveries, has increased the need for auto- matic methods for their analysis and comparison. In order to fill this gap, researchers have developed several approaches over the years. Typically these approaches follow in popular frameworks of motif discovery Comparing, Ranking and Filtering Motifs with Character Classes. By Comin, Verzotto Copyright c 2011 John Wiley & Sons, Inc. 1