Extension of Prrn: implementation of a doubly nested randomized iterative reﬁnement strategy under a piecewise linear gap cost Shinsuke Yamada 1,2 Osamu Gotoh 2,3 shinsuke@yama.info.waseda.ac.jp gotoh@cbrc.jp Hayato Yamana 1 yamana@yama.info.waseda.ac.jp 1 Department of Computer Science, Graduate School of Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan 2 Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan 3 Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan Keywords: multiple sequence alignment, piecewise linear gap cost, BAliBASE benchmark 1 Introduction Multiple sequence alignments are useful tools for elucidating the relationships among molecular func- tion, evolution, sequence, and structure. Prrn is one of the multiple sequence alignment programs [3]. When sequences with similar length are aligned, Prrn aligns them accurately; however, when some sequences have long insertions or deletions (indels), Prrn performs somewhat worse. As suggested by recent studies, an algorithm which combines local alignment information with global may construct an accurate alignment even when some sequences have long indels. In this work, we extended Prrn so that it can treat a piecewise linear gap cost when aligning two groups. The piecewise linear gap cost is a combination of L aﬃne gap costs [1]. For the sake of simplicity, we restricted ourselves to the case of L = 2. Using the piecewise linear gap cost is considered to be an alternative way of combining local and global alignment information simultaneously. When aligning two groups under the piecewise linear gap cost, it is diﬃcult to calculate gap extension penalties because groups already include gaps. Our method calculates gap extension penalties using dynamic gap information which comprises a position and a length of a dynamic gap. BAliBASE benchmark results implied that our method, on average, construct the most accurate alignments. 2 Methods Prrn is implementation of a doubly nested randomized iterative strategy [3], which mutually reﬁnes, phylogenetic tree and pair weights. The heart of the strategy is iterative reﬁnement stage based on the group-to-group sequence alignment (GSA) algorithm. The previous GSA algorithm of Prrn used an aﬃne gap cost. In order to incorporate the piecewise linear gap cost into GSA algorithm, calculation of gap lengths is required, because the inclination of this gap cost varies, depending on a gap length. It is, however, diﬃcult to calculate gap lengths using only a pre-calculated gap proﬁle [2]. Therefore, we combined the gap proﬁle with dynamic gap information (DGI). A dynamic gap is inserted during GSA process,