CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2012) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.2912 A cloud computing system in windows azure platform for data analysis of crystalline materials Qi Xing 1 and Estela Blaisten-Barojas 1,2, * ,† 1 Computational Materials Science Center, George Mason University, Fairfax, VA 22030, USA 2 School of Physics, Astronomy, & Computational Sciences, George Mason University, Fairfax, VA 22030, USA SUMMARY Cloud computing is attracting the attention of the scientific community. In this paper, we develop a new cloud-based computing system in the Windows Azure platform that allows users to use the Zeolite Structure Predictor (ZSP) model through a Web browser. The ZSP is a novel machine learning approach for classifying zeolite crystals according to their framework type. The ZSP can categorize entries from the Inorganic Crystal Structure Database into 41 framework types. The novel automated system permits a user to calculate the vector of descriptors used by ZSP and to apply the model using the Random Forest™ algorithm for classifying the input zeolite entries. The workflow presented here integrates executables in Fortran and Python for number crunching with packages such as Weka for data analytics and Jmol for Web-based atomistic visualization in an interactive compute system accessed through the Web. The compute system is robust and easy to use. Communities of scientists, engineers, and students knowledgeable in Windows-based computing should find this new workflow attractive and easy to be implemented in scientific scenarios in which the developer needs to combine heterogeneous components. Copyright © 2012 John Wiley & Sons, Ltd. Received 23 April 2012; Revised 11 July 2012; Accepted 15 July 2012 KEY WORDS: cloud computing; Windows Azure; heterogeneous scientific workflow; machine learning; zeolite structure predictor 1. INTRODUCTION Cloud computing [1–4] is a model that enables on-demand network access to configurable computing resources that are supplied to the user without service provider intervention [5]. Indeed, a cloud computing platform packages information technology (IT) resources and fetches services to customers such that they simply have to access the interface of the cloud platform to use the services. This is a new paradigm for scientific applications that do not require sophisticated parallelization and are currently performed in small computer clusters [6, 7]. Despite the existence of public clouds [8] such as Amazon [9–11], Google [12, 13], Microsoft [14, 15] providing customers with measured services through the Internet, the potential of cloud computing for scientific applications remains largely unexplored [16–20]. Currently, there is lack of open source workflows geared toward research groups in the sciences, applied mathematics, and most of the engineering communities. This deficiency is a fact because cloud developers have primarily targeted customers in business, government departments, and individuals. The absence of science and engineering consumers using public clouds is recognized by organizations such as the US National Science Foundation [21]. This organization funds fundamental research and can adopt a pay-per-use funding mechanism, if the *Correspondence to: Estela Blaisten-Barojas, Computational Materials Science Center, George Mason University, 4400 University Dr, MS 6A2, Fairfax, VA22030, USA. E-mail: blaisten@gmu.edu Copyright © 2012 John Wiley & Sons, Ltd.