CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE
Concurrency Computat.: Pract. Exper. (2012)
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.2912
A cloud computing system in windows azure platform for data
analysis of crystalline materials
Qi Xing
1
and Estela Blaisten-Barojas
1,2,
*
,†
1
Computational Materials Science Center, George Mason University, Fairfax, VA 22030, USA
2
School of Physics, Astronomy, & Computational Sciences, George Mason University, Fairfax, VA 22030, USA
SUMMARY
Cloud computing is attracting the attention of the scientific community. In this paper, we develop a new
cloud-based computing system in the Windows Azure platform that allows users to use the Zeolite Structure
Predictor (ZSP) model through a Web browser. The ZSP is a novel machine learning approach for classifying
zeolite crystals according to their framework type. The ZSP can categorize entries from the Inorganic Crystal
Structure Database into 41 framework types. The novel automated system permits a user to calculate the
vector of descriptors used by ZSP and to apply the model using the Random Forest™ algorithm for
classifying the input zeolite entries. The workflow presented here integrates executables in Fortran and
Python for number crunching with packages such as Weka for data analytics and Jmol for Web-based
atomistic visualization in an interactive compute system accessed through the Web. The compute
system is robust and easy to use. Communities of scientists, engineers, and students knowledgeable
in Windows-based computing should find this new workflow attractive and easy to be implemented in
scientific scenarios in which the developer needs to combine heterogeneous components. Copyright © 2012
John Wiley & Sons, Ltd.
Received 23 April 2012; Revised 11 July 2012; Accepted 15 July 2012
KEY WORDS: cloud computing; Windows Azure; heterogeneous scientific workflow; machine learning;
zeolite structure predictor
1. INTRODUCTION
Cloud computing [1–4] is a model that enables on-demand network access to configurable
computing resources that are supplied to the user without service provider intervention [5]. Indeed,
a cloud computing platform packages information technology (IT) resources and fetches services to
customers such that they simply have to access the interface of the cloud platform to use the services.
This is a new paradigm for scientific applications that do not require sophisticated parallelization
and are currently performed in small computer clusters [6, 7]. Despite the existence of public
clouds [8] such as Amazon [9–11], Google [12, 13], Microsoft [14, 15] providing customers with
measured services through the Internet, the potential of cloud computing for scientific applications
remains largely unexplored [16–20]. Currently, there is lack of open source workflows geared toward
research groups in the sciences, applied mathematics, and most of the engineering communities.
This deficiency is a fact because cloud developers have primarily targeted customers in business,
government departments, and individuals. The absence of science and engineering consumers using
public clouds is recognized by organizations such as the US National Science Foundation [21]. This
organization funds fundamental research and can adopt a pay-per-use funding mechanism, if the
*Correspondence to: Estela Blaisten-Barojas, Computational Materials Science Center, George Mason University,
4400 University Dr, MS 6A2, Fairfax, VA22030, USA.
†
E-mail: blaisten@gmu.edu
Copyright © 2012 John Wiley & Sons, Ltd.