Graphical representation of proteins as four-color maps and their numerical characterization Milan Randic ´ a, *, Ketij Mehulic ´ b , Damir Vukic ˇevic ´ c , Tomaz ˇ Pisanski d,e , Draz ˇen Vikic ´ -Topic ´ f , Dejan Plavs ˇic ´ f, * a National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia b School of Dental Medicine, University of Zagreb, Gundulic ´eva 5, 10000 Zagreb, Croatia c Department of Mathematics, University of Split, Nikole Tesle 12, 21000 Split, Croatia d IMFM, Department of Theoretical Computer Science, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Slovenia e UP-PINT University of Primorska, Koper, Slovenia f RuperBosˇkovic ´ Institute, NMR Center, P.O. Box 180, HR-10002 Zagreb, Croatia 1. Introduction Like DNA and RNA, proteins are linear polymers. Unlike DNA whose graphical representations were initiated over 20 years ago [1–4], graphical representations of proteins have only recently been proposed [5–17]. The main reason for this delay is the great complexity of the primary structure of proteins being formed from a selection of 20 building blocks rather than 4 as is the case with DNA and RNA. Moreover, the direct extension of graphical representations of DNA to proteins results in complicated representations of proteins whose numerical characterization are computationally involved. For example, if one extends the most simple, most elementary and straightforward representation of DNA based on four horizontal lines [18,19] to proteins, then the vector characterizing a protein have 20!/2 components (a horrendous number 1.21645100 10 18 ) and not 4!/2 or 12 components as is the case with DNA and RNA sequences. In order to surmount the aforementioned difficulties and to construct simple and useful graphical representations of proteins, the concept of virtual genetic code [5] (VGC) was introduced (vide infra). A useful graphical representation of proteins should meet the following requirements: (1) to allow one to easily visually observe and inspect similarity/dissimilarity between proteins; (2) to be compact or rather compact with regard to the spatial requirement for display of a protein; and (3) to be the basis for a simple numerical characterization of proteins. The characterization of molecular structure by invariants of molecular graph (topological indices) turned out to be highly useful in studies of molecules and examining similarity/dissim- ilarity between them, QSPR (quantitative structure property relationship), and QSAR (quantitative structure activity relation- ship) modeling [20–27], as well as in current trends in drug discovery, including QPTR (quantitative proteome-toxicity rela- tionship) modeling [28–31]. It stands to reason that a similar way of characterizing proteins, e.g. by invariants of a graphical representation of proteins (protein descriptors), and the proteome can be useful in their study. For a review of the research on Journal of Molecular Graphics and Modelling 27 (2009) 637–641 ARTICLE INFO Article history: Received 6 June 2008 Received in revised form 13 October 2008 Accepted 15 October 2008 Available online 1 November 2008 Keywords: Protein structure Graphical representation Virtual genetic code Four-color map Structure matrix S Protein descriptor ABSTRACT We put forward a novel compact 2-D graphical representation of proteins based on the concept of virtual genetic code and a four-color map. The novel graphical representation uniquely represents proteins and allows one to easily and quickly visually observe and inspect similarity/dissimilarity between them. It also leads to a novel protein descriptor, a 10-dimensional vector derived from a novel structure matrix S associated with the map. The introduced numerical characterization of proteins is not only useful for their comparative study, but also for cataloguing information on a single protein. The approach is illustrated with the A chain of human insulin and the A chain of human insulin analogue glargine. ß 2008 Published by Elsevier Inc. * Corresponding authors. E-mail addresses: mrandic@msn.com (M. Randic ´), mehulic@sfzg.hr (K. Mehulic ´), vukicevi@pmfst.hr (D. Vukic ˇevic ´), tomaz.pisanski@fmf.uni-lj.si (T. Pisanski), vikic@irb.hr (D. Vikic ´ -Topic ´), dplavsic@irb.hr (D. Plavs ˇic ´). Contents lists available at ScienceDirect Journal of Molecular Graphics and Modelling journal homepage: www.elsevier.com/locate/JMGM 1093-3263/$ – see front matter ß 2008 Published by Elsevier Inc. doi:10.1016/j.jmgm.2008.10.004