Published: November 18, 2011 r2011 American Chemical Society 308 dx.doi.org/10.1021/ci200386x | J. Chem. Inf. Model. 2012, 52, 308–318 ARTICLE pubs.acs.org/jcim Addressing Challenges of Identifying Geometrically Diverse Sets of Crystalline Porous Materials Richard Luis Martin, † Berend Smit, ‡,§ and Maciej Haranczyk* ,† † Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F-1650, Berkeley, California 94720-8139, United States ‡ Department of Chemistry, University of California, Berkeley, California 94720-1462, United States § Department of Chemical Engineering, University of California, Berkeley, California 94720-1462, United States b S Supporting Information ’ INTRODUCTION Porous materials contain complex networks of void channels and cages that are exploited in many industrial applications. The zeolite class of these materials is the most well-known, as they have found wide use in industry since the late 1950s, with common applications as chemical catalysts and membranes for separations and water softeners; 1À4 their value is estimated at $350 billion per year. 5 There is increasing interest in utiliz- ing zeolites as membranes or adsorbents for CO 2 capture applications. 3 In addition to zeolites, metal organic frameworks (MOFs) 6,7 and their subfamily of zeolitic imidazolate frame- works (ZIFs) 8 have recently generated interest for their potential use in gas separation or storage. 9À11 A key requirement for the success of any nanoporous material is that the chemical composi- tion and pore geometry and topology must be optimal under the given conditions for a particular application. However, finding the optimal material is an arduous task, since the number of possible pore topologies is extremely large. There are approxi- mately 190 unique zeolite frameworks known to exist today in more than 1400 zeolite crystals of various chemical compositions and different geometrical parameters (see ref 12). However, these experimentally known zeolites constitute only a very small fraction of more than 2.7 million structures that are feasible on theoretical grounds. 13,14 Of these, between 314 000 and 585 000 structures are predicted to be thermodynamically accessible as aluminosilicates, which gives an even larger number of possible materials via elemental substitution and different cation ex- changes. 15,16 Databases of similar or greater magnitude can be developed for other nanoporous materials such as MOFs or ZIFs. As a result, new automated computational and cheminformatic techniques need to be developed to characterize, categorize, and screen such large databases. 17 Recently, automated approaches capable of performing anal- ysis of large sets of porous materials have started to emerge. For example, Blatov and co-workers have pursued the concept of natural tiling of periodic networks to find primitive building blocks in zeolites. 18 The group of Blaisten-Barojas has developed zeolite framework classifiers using a machine learning approach. 19 D€ uren et al. have provided a tool to calculate the surface area of a porous material, 20 while Foster et al. and Haldoupis et al. have presented methods to calculate two parameters frequently used to describe pore geometry in crystalline porous materials, 17,21 namely, the diameter of the largest included (d i ) and the largest Special Issue: 2011 Noordwijkerhout Cheminformatics Received: August 15, 2011 ABSTRACT: Crystalline porous materials have a variety of uses, such as for catalysis and separations. Identifying suitable materials for a given application can, in principle, be done by screening material databases. Such a screening requires automated high-throughput analysis tools that calculate topological and geometrical parameters describing pores. These descriptors can be used to compare, select, group, and classify materials. Here, we present a descriptor that captures shape and geometry characteristics of pores. Together with proposed similarity measures, it can be used to perform diversity selection on a set of porous materials. Our representations are histogram encodings of the probe-accessible fragment of the Voronoi network representing the void space of a material. We discuss and demonstrate the application of our approach on the International Zeolite Association (IZA) database of zeolite frameworks and the Deem database of hypothetical zeolites, as well as zeolitic imidazolate frameworks constructed from IZA zeolite structures. The diverse structures retrieved by our method are complementary to those expected by emphasizing diversity in existing one-dimensional descriptors, e.g., surface area, and similar to those obtainable by a (subjective) manual selection based on materials’ visual representations. Our technique allows for reduction of large sets of structures and thus enables the material researcher to focus efforts on maximally dissimilar structures.