Cloud Computing for Geosciences Geoffrey C. Fox and Marlon E. Pierce Pervasive Technology Institute, Indiana University Cyberinfrastructure has closely tracked commercial best practices for over a decade, but we believe there is still much to learn about correct strategies for building distributed systems for collaborating scientists and related communities. In this position paper, we discuss the opportunities to the geo-sciences if Cloud Computing strategies currently used by commercial data centers are adopted. We base our conclusions on several geospatially themed projects that we have participated in. These include the NASA-funded QuakeSim project (1), the USGS-funded FloodGrid project, and the NSF-funded PolarGrid project (www.polargrid.org). Our lab has developed cyberinfrastructure software to support these distributed spatial applications, and we have also investigated Cyberinfrastructure architecture generally (2). Our applications include Geospatial Information System (GIS) Grid services based on Open Geospatial Consortium standards and real-time streaming Global Positioning System processing infrastructure (3). As can be seen from this list of applications, we take a very broad view of the problems that Cyberinfrastructure (CI) must support. Computing and data storage are just two aspects; we also need to manage real-time data streams, integrate third party capabilities (such as map and data providers), and build interactive user interfaces that act as Science Gateways (4). We take here a heterogeneous view of Cyberinfrastructure: it could include GIS services provided by state and local governments as well as Globus services on the TeraGrid. The current flagship deployments of cyberinfrastructure, such as the NSF TeraGrid, do not take a comprehensive approach towards cyberinfrastructure and are dominated by the requirements of traditional high performance computing users. Arguably the NSF DataNet program will address the data-centric needs of cyberinfrastructure, such as long-term storage and preservation of observational and experimental data, but this program is in its infancy. We advocate generally for the adoption of Cloud Computing approaches to CI, which we believe will offer a broader approach to infrastructure. Cloud Computing-like infrastructure is of particular interest to Spatial CI applications, which provides important use cases that help clarify the capabilities an end-to-end CI deployment should provide. The recently funded Future Grid project (futuregrid.org) can serve as a testbed for evaluating these Cloud approaches to geospatial CI. Cyberinfrastructure and Cloud Computing: Cloud Computing is a marketing term that is usually left poorly defined. However, because of its potential value to academic computing, academic surveys and initial investigations exist (see for example 5, 6). We will focus on two specific aspects: Cloud Computing to provide infrastructure and Cloud Computing to provide runtime management. Infrastructure as a Service: Clouds may be defined as Web services that control the life cycles of virtual machines and virtual storage. The very well known Amazon and Microsoft Azure cloud systems fall in this category. Xen (www.xen.org) is a popular technology for virtualizing server farms and data centers based on Linux; Microsoft similarly has Hyper-V for Windows Server 2008-based farms. Through Web services and virtualization, users can create and control their own resources. The virtual machines can come preconfigured with useful software. For example, one may imagine checking out a virtual machine or cluster that comes pre-configured with geospatial software. Less well known than the virtual machine but at least as important is the virtual block storage device. The best example of this is the Amazon Elastic Block Store, which