Improving frequent subgraph mining in the presence of symmetry Christian Desrosiers, Philippe Galinier, Alain Hertz Ecole Polytechnique de Montreal {christian.desrosiers,philippe.galinier,alain.hertz}@polymtl.ca Pierre Hansen HEC Montreal pierre.hansen@gerad.ca Abstract While recent algorithms for mining the frequent subgraphs of a database are efficient in the general case, these algorithms tend to do poorly on databases that have a few or no labels. Although little attention has been given to such datasets, there are many existing ap- plications which deal with this type of data. In this paper, we present a novel algorithm, called SyGMA, that improves frequent subgraph min- ing in such cases by limiting the impact of symmetry on calculations, without the use of memory-expensive structures. Through experimen- tation on various datasets, we show that our algorithm outperforms, in many cases, one of the leading algorithms for this task. Keywords : Data mining, frequent subgraphs, graph isomorphism. 1 Introduction Graph mining is a recent discipline which aims to extract useful knowledge from a large amount of structured data modeled as graphs. Already, this dis- cipline plays a key role in important fields like chemoinformatics and bioin- formatics, especially in the process of drug discovery. In the next decade, its importance will undoubtedly increase with the emergence of new technolo- gies dealing with a greater amount of structured information, particularly in the Web domain. The discovery of frequent subgraphs is a fundamen- tal task of graph mining which consists in finding statistically significant 1