Parallel Multiple Alignment of the Influenza Virus A/H1N1 Genome Sequences on a Heterogeneous Compact Computer Cluster Plamenka Borovska, Ognian Nakov, Veska Gancheva, Ivailo Georgiev Computer Systems Department Technical University of Sofia 8 Kliment Ohridski Boul.,1756 Sofia BULGARIA pborovska@tu-sofia.bg, nakov@tu-sofia.bg, vgan@tu-sofia.bg, ivailo_georgiev@tu-sofia.bg Abstract: - The problem of restraining the spreading of pandemics and the treatment of the infected by the influenza virus people is widely based on the latest achievements of molecular biology, cellular biology, biocomputing, as well as many other advanced areas of modern science. In this paper we have performed comparative analyses of viral nucleotide sequences and finding out consensus motifs and variable domains in the different segments of influenza virus A/H1N1 genome on the basis of parallel computer simulation. For this purpose a parallel computational model based on parallel ClustalW algorithm for multiple sequence alignment is suggested. The suggested model has been verified on the basis of parallel program implementations on a heterogeneous compact cluster. Key-Words: - Parallel Algorithms, Parallel Programming, Parallel Performance, High Performance Computing, Biocomputing, Multiple Sequences Alignment, ClustalW Algorithm 1 Introduction Biocomputing and molecular biology are areas, demanding knowledge and skills for acquisition, storing, management, analysis, interpretation and dissemination of biological information. This requires the utilization of high performance computers and innovative software tools for the management of the vast information, as well as the deployment of innovative algorithmic techniques for the analysis, interpretation and prognostication of data in order to get to the insight of the design and validation of life-science experiments. The recent whole genome sequencing technology made it possible to reveal the nucleotide sequence of more than 1500 viral, bacterial, plants and animal genomes after the year 2000. The world DNA databases are accessible for common use and usually contain information for more than one (up to several thousands) individual genomes for each species. Until July 8, 2009, 3702 human and avian isolates have been completely sequenced and made available through GenBank [11] (Fig. 1). The National Institute of Allergy and Infectious Diseases (NIAID) at the National Institutes of Health (NIH), USA runs a project called “Influenza Genome Sequencing Project” [12]. It is a collaborative project designed to increase the genome knowledge base of influenza and help researchers understand how flu viruses evolve, spread, and cause disease. The sequencing effort, conducted in part by the NIAID Microbial Sequencing Center at the J. Craig Venter Institute (JCVI), is revealing complete genetic blueprints of thousands of known human and avian influenza viruses, rapidly making them available through GenBank® and NIAID Bioinformatics Resource Center. Fig. 1. Sequencing Production as of 2009-07-08 The huge sequences of biological data, being accumulated in data bases, require the development of efficient tools for genome sequences comparative analysis. The information, obtained as a result of the genome sequence analyses, has various applications. There exist various tools for analyzing such sequences. In spite of that, deploying these tools to the analysis of large sets genome sequences data is un-applicable for single-processor machines. The major shortcomings of the purely experimental approach to searching and analyzing imply the significant financial expenditures RECENT ADVANCES in SOFTWARE ENGINEERING, PARALLEL and DISTRIBUTED SYSTEMS ISSN: 1790-5117 50 ISBN: 978-960-474-156-4