Latent Structure Pattern Mining Andreas Maunz 1 , Christoph Helma 2 Tobias Cramer 1 , and Stefan Kramer 3 1 Freiburg Center for Data Analysis and Modeling (FDM), Hermann-Herder-Str. 3, D-79104 Freiburg im Breisgau, Germany maunza@fdm.uni-freiburg.de, cramer@seminar-fr.de 2 in-silico Toxicology, Altkircherstr. 4, CH-4054 Basel, Switzerland helma@in-silico.ch 3 Institut f¨ ur Informatik/I12, Technische Universit¨ at M¨ unchen, Boltzmannstr. 3, D-85748 Garching bei M¨ unchen, Germany kramer@in.tum.de Abstract. Pattern mining methods for graph data have largely been restricted to ground features, such as frequent or correlated subgraphs. Kazius et al. have demonstrated the use of elaborate patterns in the biochemical domain, summarizing several ground features at once. Such patterns bear the potential to reveal latent information not present in any individual ground feature. However, those patterns were handcrafted by chemical experts. In this paper, we present a data-driven bottom-up method for pattern generation that takes advantage of the embedding relationships among individual ground features. The method works fully automatically and does not require data preprocessing (e.g., to intro- duce abstract node or edge labels). Controlling the process of generating ground features, it is possible to align them canonically and merge (stack) them, yielding a weighted edge graph. In a subsequent step, the subgraph features can further be reduced by singular value decomposition (SVD). Our experiments show that the resulting features enable substantial per- formance improvements on chemical datasets that have been problematic so far for graph mining approaches. 1 Introduction Graph mining algorithms have focused almost exclusively on ground features so far, such as frequent or correlated substructures. In the biochemical domain, Kazius et al. [6] have demonstrated the use of more elaborate patterns that can represent several ground features at once. Such patterns bear the potential to reveal latent information which is not present in any individual ground feature. To illustrate the concept of non-ground features, Figure 1 shows two molecules, taken from a biochemical study investigating the ability of chemicals to cross the blood-brain barrier, with similar gray fragments in each of them (in fact, due to symmetry of the ring structure, the respective fragment occurs twice in the second molecule). Note that the fragments are not completely identical, but differ in the arrow-marked atom (nitrogen vs. oxygen). However, regardless of this difference, both atoms have a strong electronegativity, resulting in a decreased