Functional Annotation of Protein Isoforms and Modified Forms Harold J Drabkin 1 , Cecilia N. Arighi 2 , Cathy H Wu 2 , Judith A Blake 1 1 Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA 2 Georgetown University Medical Center, Washington DC, US Abstract Eukaryotic organisms can generate protein diversity through both post-transcriptional as well as post- translational processing events. The multiple proteins arising from the same gene may differ from each other in their temporal or tissue-specific expression, molecular function, cellular localization or participation in biological processes. Currently, many model organism databases associate functional and other annotations collectively to the gene encoding a protein object. Thus, functional, temporal and spatial distinctions between protein isoforms arising from a single gene are lost. In this paper, we discuss the strategies and challenges encountered by the Mouse Genome Informatics curations during the process of annotating protein isoforms.. Keywords Biological data mining and knowledge discovery, Bio-ontologies, Biological databases and information retrieval, Biological data visualization, Biological data integration 1 Introduction One of the hallmarks of gene expression in eukaryotic organisms is the ability of a single gene to give rise to multiple gene products, including isoforms originated through mRNA processing, and a wide spectrum of protein forms as those derived from cleavage (such as signal peptide, or processing for activation) and/or post- translational modifications (such as phosphorylation, acetylation, glycosylation and ubiquination). This process can increase a gene's potential coding capacity several fold. Quite often, these isoforms share many functions. However, it is also the case that they can often differ in terms of tissue specificity of expression, subcellular localization, and functional properties. Figure 1 Schema of the Protein Ontology Framework