1545-5963 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCBB.2019.2936570, IEEE/ACM Transactions on Computational Biology and Bioinformatics 1 RNA Secondary Structure Prediction with Pseudoknots using Chemical Reaction Optimization Algorithm Md. Rafiqul Islam, Md. Shahidul Islam, Nazmus Sakeef Computer Science and Engineering Discipline Khulna University, Khulna-9208, Bangladesh dmri1978@yahoo.com, shahidcseku@gmail.com, nazmussakeef1700@gmail.com Abstract—RNA molecules play a significant role in cell function especially including pseudoknots. In past decades, several meth- ods have been developed to predict RNA secondary structure with pseudoknots and the most popular one uses minimum free energy. It is a nondeterministic polynomial-time hard (NP-hard) problem. We have proposed an approach based on a metaheuris- tic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem. The reaction operators of CRO algorithm have been redesigned and used on the generated population to find the structure with the minimum free energy. Besides, we have developed an additional operator called Repair operator which has a great influence on our algorithm in increasing accuracy. It helps to increase the true positive base pairs while decreasing the false positive and false negative base pairs. Four energy models have been applied to calculate the energy. To evaluate the performance, we have used four datasets containing RNA pseudoknotted sequences taken from the RNA STRAND and Pseudobase++ database. We have compared the proposed approach with some existing algorithms and shown that our CRO based model is a better prediction method in terms of accuracy and speed. Index Terms—Chemical Reaction Optimization (CRO), Mini- mum Free Energy (MFE), Pseudoknot. I. I NTRODUCTION RNA is a vital biopolymer performing key functions of the cellular life consisting of four nucleotides: Adenine (A), Cytosine (C), Guanine (G) and Uracil (U). RNA structure is a construction of A-U, G-C, and G-U hydrogen bonds. A-U and G-C are called canonical base pairs and G-U is called non-canonical base pair and they are the main factors in the folding process of RNA [1]. Besides the biological role of RNA in transcription and translation, RNA polymers have many significant roles in many cellular processes such as carrying genetic information, participating in the regulation of gene expression, functioning as catalysts etc [1]. To understand the functions of RNA, we need to find their structures. There are several physical methods to predict the RNA structures like X-Ray crystallography etc. These techniques are too expensive and extremely time-consuming [2]. RNA secondary structure prediction with pseudoknots is a significant challenge in Bioinformatics. RNA pseudoknots are found in many RNAs such as ribosomal RNAs, telomerase RNAs and viral RNAs [3]. Pseudoknots are formed when nucleotides of a hairpin loop join with a stem outside of the loop to form a helical stem that is adjacent or nearly adjacent to the hairpin stem. There exist several methods for RNA secondary structure prediction [3], [4], [5], [6], [7], [8] and the most common one uses the minimum free energy (MFE) method. Pseudoknots in RNA play significant roles in ribosomal frameshifting, splicing, rival genome replication and regulation of translation [3]. RNA pseudoknot prediction helps in RNA 3D structure prediction. Without knowing the formation of RNA secondary structure it is very difficult to construct 3D structure because the RNA structure is hierarchical and its folding is sequential [9]. The knowledge of protein 3D structures is vitally important for rational drug design. Although X-ray crystallography is a powerful tool in determining protein 3D structures, it is time-consuming and expensive, and not all proteins can be successfully crystallized. Membrane proteins are difficult to crystallize and most of them will not dissolve in normal solvents. Therefore, so far very few membrane protein structures have been determined. The recent breakthroughs indicate that NMR is indeed a very powerful tool in determining the 3D structures of membrane proteins [10], but it is also time-consuming and costly. To acquire the structural information in a timely manner, 3D protein structures were developed by means of homology technique [11], [12], and were found very useful for drug development. Predicting the secondary structure provides an estimation of the 3D structure backbone arrangement. RNA 3D structure prediction programs use the data derived from experiments and programs for secondary structure and pseudoknot prediction. Many programs like ERNA-3D can produce a 3D representation of an RNA from a known secondary structure [2]. Therefore it will be very helpful to predict RNA secondary structure with pseudoknots to assist in RNA 3D structure prediction. Prediction of RNA structure including pseudoknots helps the biologists to understand the mechanism and actions of the RNA pseudoknots in the cell. We can better understand the RNA structure and their associated functionalities by detecting these vast class of RNA pseudoknots [9]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TCBB.2019.2936570 Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.