International Journal of Computer Science and Communication Security (IJCSCS) Volume1, July 2011 65 Haar Wavelet Transform of The Signal Representation of DNA Sequences MOHAMED EL-ZANATY 1 , MAGDY SAEB 1 , A. BAITH MOHAMED 1 , SHAWKAT K. GUIRGUIS 2 1. School of Engineering, Computer Department, Arab Academy for Science, Technology and Maritime Transport Alexandria, EGYPT 2. Information Technology Department, Institute of Graduate Studies & Research, Alexandria University, EGYPT Abstract: Complex sequences of DNA nucleotides and their associated search techniques can be relatively simplified when presented as a digital signal. This approach applies known signal processing techniques for the analysis of genomic information. We present a set of tools for the signal representation and analysis of genomic information. In this work, we provide a matrix and a sparse polynomial representation of the DNA. We show that sparse polynomial representation of the DNA sequences improves the search performance and reduces the storage requirements. The DNA nucleotides are presented using the compact form similar to QR (Quick Response) representation that offers a broad scope of practical usage. In addition, we speed up the search process by applying the Haar Wavelet technique on the resulting DNA signals. Based on multi-level Haar transform, the search starts at the n-th trend and follow the various levels upward until a match is found. Some important features of nucleotide sequences are revealed using these visual signal representations by comparing members of the same evolutionary family. Key words: DNA, Signal Analysis, Haar Wavelet, Multi-resolution Analysis, Sparse Polynomial, Quick Response. 1. Introduction DNA, over millions of years, has demonstrated its effectiveness as a coding medium for the instruction set that governs and propagates living things. DNA is an appealing media for data storage due to the very large amounts of data that can be stored in a compact volume. DNA storage capacity vastly exceeds the storage capacities of conventional electronic, magnetic and optical media. A gram of DNA contains about 10 21 DNA bases, or about 10 8 tera-bytes. Hence, a few grams of DNA may have the potential of storing all the data stored in the world [1]. Different methods of DNA database search were developed, but most of these methods were built on string matching and bases alignment, therefore they needs much processing time and results have no established standards [1,2,3]. For example in order to find this sequence in the database shown in Table 1, we need to move through 300021 rows and make string matching (base alignment) for each one. “ATTCTTCG…………TAGTCGT” This is very slow and consumes relatively large amounts of power. The complexity of this process depends on the table size n which means O (n). Parallel processing may speed up this process, however with added hardware and software. BLAST (Basic Local Alignment Search Tool) is one of the most widely used bioinformatics programs. It addresses a fundamental problem and the algorithm emphasizes speed over sensitivity [7]. Before fast algorithms such as BLAST and FASTA (Fast All) were developed, performing database searches for the protein or nucleic sequences was very time consuming. _________________________________ Manuscript received June 6, revised June 12