Identifying Zeolite Frameworks with a Machine Learning Approach Shujiang Yang, † Mohammed Lach-hab, † Iosif I. Vaisman, †,‡ and Estela Blaisten-Barojas* ,†,§ Computational Materials Science Center, and Department of Computational and Data Sciences, George Mason UniVersity, MSN 6A2, Fairfax, Virginia 22030, and Department of Bioinformatics and Computational Biology, George Mason UniVersity, MSN 5B3, Manassas, Virginia 20110 ReceiVed: July 23, 2009; ReVised Manuscript ReceiVed: October 14, 2009 Zeolites are microporous crystalline materials with highly regular framework structures consisting of molecular- sized pores and channels. The characteristic framework type of a zeolite is conventionally defined by combining information on its coordination sequences, vertex symbols, tiling, and transitivity information. Here we present a novel knowledge-based approach for zeolite framework type classification. We show the predicting abilities of a machine learning model that uses a nine-dimensional feature vector including novel topological descriptors obtained by computational geometry techniques, together with selected physical and chemical properties of zeolite crystals. Trained on the crystallographic structures of known zeolites, this model predicts the framework types of zeolite crystals with very high accuracy. 1. Introduction Zeolites, with the diversity of their natural forms, are among the most abundant mineral species on earth. In addition to about forty species occurring naturally, hundreds of other zeolites have been synthesized. They are widely used for adsorption, ion- exchange, and heterogeneous catalysis (a substantial portion of gasoline is produced with zeolites as catalysts) and in a number of emerging areas such as biomedical technology, sensors, and solar energy conversion. 1 These applications capitalize on the unique microporous crystalline structure of zeolites, character- ized by uniformly distributed pores and channels of molecular size that give a topology signature to each zeolite type. As a consequence, proper identification of the zeolite topology pattern is crucial to a specific application. The structure of a zeolite is composed of a three-dimensional supporting network filled with loosely bound exchangeable cations and adsorbent phase. The building blocks of the underlying network are TO 4 chemical groups where the central T atom (most commonly a Si, Al, or P atom) is tetrahedrally coordinated by four oxygen atoms. The backbone structure is constructed by linking TO 4 tetrahedral units through oxygen- corner sharing, yielding a network-like pattern. This pattern replicates periodically giving rise to well-organized arrays of channels that comprise topological characteristics specific to the zeolites. 2 Such an atomic backbone constitutes the framework of a zeolite, which gives a topological signature for identifying the network connectivity of the TO 4 building units. Frameworks do not depend on specific cations, adsorbent phase, chemical composition, or physical and mechanical properties of the zeolite crystals. Following the rules set up by the Commission on Zeolite Nomenclature of the International Union of Pure and Applied Chemistry, 3 a distinct framework type is labeled by a framework type code (FTC) denoted by three capital letters. FTCs are assigned and curated by the Structure Commission of the International Zeolite Association (IZA). 4 Search for novel type of zeolites has been in the past and continues to be today an actively pursued research area. Currently, 191 distinct framework types have been approved by IZA, including 5 frameworks approved in the first half of 2009 and 10 others in 2008. 4 The FTC of a zeolite is normally determined unambigu- ously using the standard approach relying on the combined determination of zeolite framework coordination sequences 5 and vertex symbols. 6 More recently IZA has included the symbolic tiling and transitivity descriptions 7,8 in the characterization of 189 ideal framework types. Delaney symbols (D-sym) can be determined from the tiling information, indicating the complexity of the atomic network associated to each framework type. 9 Complexity is important but not unique to each framework type. Albeit rare, it may happen that two real zeolite crystals belonging to different framework types would have identical coordination sequences and vertex symbols. 10 The latter is possible because FTCs are backed-up by theoretically built perfect framework structures. Therefore, a mechanism for univocally identifying FTCs from known natural and synthetic zeolite crystals, which are never perfect, is highly desirable. Machine learning algorithms are used to discover complex patterns embedded within large amounts of data. They have been successfully applied in fields ranging from speech and vision recognition, robot control, and business management to bioin- formatics and drug design. However, in materials science, and for the analysis of zeolite structure in particular, machine learning methods are practically unexplored as evident from the very limited body of literature on this subject. 11,12 Spatial patterns in the condensed phases of materials can be identified using Delaunay tesselation 13 of the point set associated to the site- location of atoms in such materials. This computational geometry approach provides an objective, nonarbitrary definition of nearest neighbor points in space that has been successfully applied for structural and topological characterization of a variety of condensed matter systems including liquids, 14 proteins, 15,16 and zeolites. 17 In this work we combine the computational geometry techniques for generating essential topological descriptors characterizing the structure of zeolite crystals with the machine learning methods affording a novel classification approach of their framework types. Previously we have explored a similar * Corresponding author. E-mail: blaisten@gmu.edu. † Computational Materials Science Center. ‡ Department of Bioinformatics and Computational Biology. § Department of Computational and Data Sciences. J. Phys. Chem. C 2009, 113, 21721–21725 21721 10.1021/jp907017u 2009 American Chemical Society Published on Web 12/07/2009