September 28, 2006 10:43 Proceedings Trim Size: 9.75in x 6.5in apbc118a INFERRING A CHEMICAL STRUCTURE FROM A FEATURE VECTOR BASED ON FREQUENCY OF LABELED PATHS AND SMALL FRAGMENTS TATSUYA AKUTSU ∗ Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto 611-0011, Japan E-mail: takutsu@kuicr.kyoto-u.ac.jp DAIJI FUKAGAWA National Institute of Informatics Chiyoda-ku, Tokyo 101-8430, Japan E-mail: daiji@nii.ac.jp This paper proposes algorithms for inferring a chemical structure from a feature vector based on fre- quency of labeled paths and small fragments, where this inference problem has a potential application to drug design. In this paper, chemical structures are modeled as trees or tree-like structures. It is shown that the inference problems for these kinds of structures can be solved in polynomial time using dynamic programming-based algorithms. Since these algorithms are not practical, a branch- and-bound type algorithm is also proposed. The result of computational experiment suggests that the algorithm can solve the inference problem in a few or few-tens of seconds for moderate size chemical compounds. 1. Introduction Drug design is one of the important targets of bioinformatics. For designing new drugs, classification of chemical compounds is important and thus a lot of studies have been done. Recently, kernel methods have been applied to classification of chemical compounds 1,2,3,4 . In most of these approaches, chemical compounds are mapped to feature vectors (i.e., vec- tors of reals) and then support vector machines (SVMs) 5 are employed to learn classifica- tion rules. Though several methods have been proposed, feature vectors based on frequency of small fragments 1,2 or frequency of labeled paths 3,4 are widely used, where other chemi- cal properties such as molecular weights, partial charges and logP are sometimes combined with these, and weights/probabilities are sometimes put on paths/fragments. On the other hand, a new approach was recently proposed for designing and/or opti- mizing objects using kernel methods 6,7 . In this approach, a desired object is computed as a point in the feature space using suitable objective function and optimization technique and then the point is mapped back to the input space, where this mapped back object is * Work partially supported by Grants-in-Aid “Systems Genomics” and #16300092 from MEXT, Japan, and by the Kayamori Foundation of Information Science Advancement. 1