34 International Journal of Applied Metaheuristic Computing, 3(1), 34-47, January-March 2012 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Keywords: Bioinformatics, DNA Sequence, Gene-Finding, Genome Database, Reinforcement Learning 1. INTRODUCTION There is an explosive growth in the amount of sequenced nucleotides from a number of ge- nome projects. Several million bases of genomic DNA are sequenced daily and made available to the public. The genome annotation, one of the most important works of genome project, is to find all existing genes on the genomic DNA sequences. Conventionally, the genome anno- tation can be divided into three steps, namely automatic annotation, manual annotation and experimental verification. The automatic an- notation is the main task of bioinformatics. This is because manually annotating the coding regions of genes on all genomic sequences from scratch is impractical; instead, the sequences Reinforcement Learning for Improving Gene Identifcation Accuracy by Combination of Gene-Finding Programs Peng-Yeng Yin, National Chi Nan University, Taiwan Shyong Jian Shyu, Ming Chuan University, Taiwan Shih-Ren Yang, Ming Chuan University, Taiwan Yu-Chung Chang, Ming Chuan University, Taiwan ABSTRACT Due to the explosive and growing size of the genome database, the discovery of gene has become one of the most computationally intensive tasks in bioinformatics. Many such systems have been developed to fnd genes; however, there is still some room to improve the prediction accuracy. This paper proposes a reinforce- ment learning model for a combination of gene predictions from existing gene-fnding programs. The model learns the optimal policy for accepting the best predictions. The ftness of a policy is reinforced if the selected prediction at a nucleotide site correctly corresponds to the true annotation. The model searches for the op- timal policy which maximizes the expected prediction accuracy over all nucleotide sites in the sequences. The experimental results demonstrate that the proposed model yields higher prediction accuracy than that obtained by the single best program. DOI: 10.4018/jamc.2012010104