Compact 2-D graphical representation of DNA Milan Randi c 1 , Marjan Vra cko * , Jure Zupan, Marjana Novi c Laboratory for Chemometrics, National Institute of Chemistry, Hajdrihova 19, Ljubljana 001, Slovenia Received 18 February 2003; in final form 4 April 2003 Abstract We present a novel 2-D graphical representation for DNA sequences which has an important advantage over the existing graphical representations of DNA in being very compact. It is based on: (1) use of binary labels for the four nucleic acid bases, and (2) use of the ÔwormÕ curve as template on which binary codes are placed. The approach is illustrated on DNA sequences of the first exon of human b-globin and gorilla b-globin. Ó 2003 Elsevier Science B.V. All rights reserved. 1.Introduction One of the disadvantages of currently available 2-D graphical representations of DNA [1–8] is that they are not compact. They require considerable space if one is interested to visualize local varia- tions in different DNA sequences. In this Letter we introduce a novel very compact 2-D graphical representation. In contrast to available 2-D graphical representations, compact representation allows visual inspection of lengthy DNA sequences without requiring excessive space. Compact rep- resentations are based on using a compact math- ematical curve as a template on which nucleic bases are displayed. We refer to a family of mathematical curves of considerable length but confined to a limited 2-D or 3-D space as compact curves. Once a template curve (which has vertices at equidistant positions) has been selected, one assigns to its vertices nucleotide bases according to a selected protocol. We decided to represent the four nucleic acid bases adenine (A), guanine (G), cytosine (C), and tyrosine (T) by binary labels 00, 01, 10 and 11, respectively. Thus, to be recorded, each base requires a pair of adjacent vertices on the template curve. In the next step we replace the binary labels 00, 01, 10 and 11 by white and black circles as their graphical equivalents. Finally we retain only the black circles corresponding to digit Ô1Õ of the binary codes to obtain a novel compact graphical representation of DNA. Binary labels for the representation of the four nucleic acid ba- ses of DNA sequences have been used previously also by other authors [9]. Among advantages of graphical representation of DNA sequence [1–8] besides facilitating visual inspection of sequences is that they allow one to construct numerical characterization of the 2-D, 3-D and even 4-D patterns of DNA [10–15]. Chemical Physics Letters 373 (2003) 558–562 www.elsevier.com/locate/cplett * Corresponding author. Fax: +00386-61-125-9244. E-mail address: marjan.vracko@ki.si (M. Vra cko). 1 Visitor. Home address: 3225 Kingman Road, Ames, IA 50014, USA; fax: +515-292-8629. 0009-2614/03/$ - see front matter Ó 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0009-2614(03)00639-0