MASTtreedist: Visualization of Tree Space Based on Maximum Agreement Subtree HONG HUANG 1 and YONGJI LI 2 ABSTRACT Phylogenetic tree construction process might produce many candidate trees as the ‘‘best estimates.’’ As the number of constructed phylogenetic trees grows, the need to efﬁciently compare their topological or physical structures arises. One of the tree comparison’s software tools, the Mesquite’s Tree Set Viz module, allows the rapid and efﬁcient visual- ization of the tree comparison distances using multidimensional scaling (MDS). Tree- distance measures, such as Robinson-Foulds (RF), for the topological distance among different trees have been implemented in Tree Set Viz. New and sophisticated measures such as Maximum Agreement Subtree (MAST) can be continuously built upon Tree Set Viz. MAST can detect the common substructures among trees and provide more precise information on the similarity of the trees, but it is NP-hard and difﬁcult to implement. In this article, we present a practical tree-distance metric: MASTtreedist, a MAST-based comparison metric in Mesquite’s Tree Set Viz module. In this metric, the efﬁcient opti- mizations for the maximum weight clique problem are applied. The results suggest that the proposed method can efﬁciently compute the MAST distances among trees, and such tree topological differences can be translated as a scatter of points in two-dimensional (2D) space. We also provide statistical evaluation of provided measures with respect to RF- using experimental data sets. This new comparison module provides a new tree–tree pairwise comparison metric based on the differences of the number of MAST leaves among constructed phylogenetic trees. Such a new phylogenetic tree comparison metric improves the visualization of taxa differences by discriminating small divergences of subtree structures for phylogenetic tree reconstruction. Key words: cancer genomics, computational molecular biology, phylogenetic analyses. INTRODUCTION R esearchers collect data such as DNA sequences for each of the different taxa (genes, species, etc.) and then construct phylogenetic trees. Many tree reconstruction methods can produce more than one candidate tree for the input dataset. Very often the number of trees can be in the hundreds or thousands (Than et al., 2008; Matthews et al., 2010; Ayres et al., 2012). These candidate trees are computed so as to 1 School of Information, University of South Florida, Tampa, FL. 2 Department of Computer Science, Sun Yetsen University, Guangzhou, China. JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 1, 2013 # Mary Ann Liebert, Inc. Pp. 42–49 DOI: 10.1089/cmb.2012.0243 42