GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists Ewa Deelman, Carl Kesselman, Gaurang Mehta, Leila Meshkat, Laura Pearlman Information Sciences Institute University of Southern California Marina Del Rey, CA 90292 {deelman, carl, gmehta, meshkat, laura}@isi.edu Kent Blackburn, Phil Ehrens, Albert Lazzarini, Roy Williams California Institute of Technology 1200 East California Boulevard Pasadena, California 91125 {kent, pehrens, lazz}@ligo.caltech.edu, roy@cacr.caltech.edu Scott Koranda University of Wisconsin Milwaukee Department of Physics 1900 East Kenwood Blvd Milwaukee, WI 53211 skoranda@gravity.phys.uwm.edu Abstract Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data whether it is raw experimental data or a data product which is a result of further processing. GriPhyN provides a new degree of transparency in how data- handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN refers to the set of all data products available to the user as Virtual Data. Among the physics applications participating in the project is the Laser Interferometer Gravitational- wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. In this paper, we describe our initial design and prototype of a Virtual Data Grid for LIGO. 1. Introduction GriPhyN (Grid Physics Network, www.griphyn.org) is a NSF-funded project which aims to support large-scale data management in physics experiments such as high-energy physics, astronomy and gravitational wave physics. GriPhyN puts data both raw and derived under the umbrella of Virtual Data[1]. A user or application can ask for data using application-specific metadata without needing to know whether the data is available on some storage system or if it needs to be computed. To satisfy the request, GriPhyN will schedule the necessary data movements and computations to produce the requested results. GriPhyN uses the Globus toolkit (www.globus.org ) as the basic grid infrastructure and builds on top of it high-level services, which can support virtual data requests. Some of these services are request planning, request execution, replica selection, etc. In this paper we describe the work we have done within the context of one of the experiments in