Abstract—The technological developments of recent years has helped biologists to extract, examine and store the genetic information of living beings. Thus, the databases become very large and contain a large amount of redundant or poorly analyzed information. This increase in size becomes a great challenge for the data storage. There will be, in this case, a difficulty to properly analyse, rank well, and save all the data. As a solution to this problem, we propose, through this article, an algorithm for determining the optimal local alignment and in which we offer the possibility of representing a DNA sequence by a substring of its genetic information and therefore reduce the amount of data banks information. Index Terms—bioinformatics, DNA sequence alignment, database, High Throughput Sequencing; I. INTRODUCTION T o determine the membership of a strain to a given specie, biologists compare it with a known sequence of reference of the specie to which it is presumed to belong. If the similarity percentage is very large, then we conclude that this sequence belongs to a well-defined specie. The comparison between sequences can also allow comparing different species between them. These comparisons lead to the conclusion that two species have a common ancestor or not. In order to properly analyze the results of alignment methods and comparison of DNA sequences, we assign weights to the various pairs of the sequence to calculate the degree of similarity and the costs of non-similarity between sequences. This operation allows us to infer relationships between the sequences. This relationship is described as the degree of similarity between sequences. This degree of similarity is quantified by a score. The most commonly used Alignment Algorithms between sequences are the Smith- Waterman algorithm [1] which determines local alignment between DNA sequences and the algorithm of Needleman and Wunsch [2] which determines a global alignment between DNA sequences. Manuscript received July 01, 2015; revised July 19, 2015. Bacem Saada, Ph.D. Student with Harbin Engineering University, College of Computer Science and Technology, Harbin, China, (email:basssoum@gmail.com). Jing Zhang, Ph.D. Professor with Harbin Engineering University, College of Computer Science and Technology, Harbin, China, (email: zhangjing@hrbeu.edu.cn). II. STATE OF THE ART The process of alignment and comparison of DNA sequences presents several problems. On first view, today, there are several databases open access to all publicly available DNA sequences and their protein translations. These banks continue to grow at a positive exponential rate. In 2006, GenBank, for example, created within the framework of international collaboration on nucleotide sequencing, contained over 65 billion nucleotide bases [3].In 2013 its more than 154.2 billion base. Nowadays, the quantity of information can reach petabytes in size [4]. In this case, make a treatment on a large number of sequences to infer a sequence belonging to a given species, is very expensive in terms of execution time and resources to allocate. To decrease the amount of stored information, researchers are trying to reduce the number of DNA sequences stored in their databases and keep only the DNA sequences that best characterize each specie. From another point of view, storage of such alignment is also a problem. Thereafter, any analysis, interpretation or operation of this alignment would be impossible. And if the researcher decides to use a portion of the sequence, no current algorithm allows him to choose, optimally, the length desired to extract from the original chain. To solve the problems described above and to optimize the use and performance of alignment algorithms and comparison of DNA sequences, a set of research was conducted. Some researchers have tried to reduce the complexity of dynamic programming algorithms. For example, Yongchao Liu, Douglas L. Maskell, Bertil Schmidt [5] and Yongchao Liu, Bertil Schmidt, Douglas L. Maskell [6] have tried to reduce the total running time of the algorithm of Smith and Waterman exploiting multicore processors architecture Nvidia and their Cuda technology which optimizes the use of these GPUs. Granger G. Sutton, Owen White, Mark D. Adams and Anthony R. Kerlavage [7] tried also to propose an algorithm that divides the genome into regions while detecting similar regions in order to reduce alignment operations between nucleotides. Furthermore, recently, the growth of the new DNA sequences alignment technologies has enabled the study of human genome [8, 9]. The size of those genomes reach 3 billion bases. Other species can even reach more than 100 billion bases such us some amphibian species [10]. The use of conventional algorithms for alignment and comparison of DNA sequences is not possible. Indeed the result of an alignment between entire genomes would be an alignment of millions of base pairs, including the time of execution Representation of a DNA Sequence by a Subchain of its Genetic Information Bacem Saada, Jing Zhang Proceedings of the World Congress on Engineering and Computer Science 2015 Vol II WCECS 2015, October 21-23, 2015, San Francisco, USA ISBN: 978-988-14047-2-5 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) WCECS 2015