Grid-Enabling the Global Geodynamics Project: The Introduction of an XML-Based Data Model L. Ian Lumb Platform Computing Inc. 3760 14 th Avenue Markham, Ontario, Canada L3R 3T7 Email: ilumb@platform.com Keith D. Aldridge Earth & Space Science and Engineering York University Toronto, Ontario, Canada M3J 1P3 Email: keith@yorku.ca Abstract— The Global Geodynamics Project (GGP) provides a reasonable representation of the scientific collaboration evident in small-to-medium-scale initiatives. GGP also provides data management challenges that are different from those typically expressed by other areas of the physical sciences - e.g., High Energy Physics. These distinctions make GGP an interesting candidate for assessing the challenges and opportunities asso- ciated with technically enabling collaborative science via Grid Computing. Emphasis is placed here on the introduction of an XML-based data model into the GGP. Although it is concluded that Earth Sciences Markup Language (ESML) is highly effective and efficient in introducing a new data model, and paves the way for structural transformations on data, challenges and opportunities are also identified. Metadata (data about data) provides the gravest concern and therefore the most-important focus for further research. Citation—Lumb, L. I. and K. D. Aldridge, Grid-Enabling the Global Geodynamics Project: The Introduction of an XML-Based Data Model, in Proceedings of The 19th International Symposium on High Performance Computing Systems and Applications, HPCS 2005, I. Kotsireas and D. Stacey (editors), The IEEE Computer Society, 216–222, 2005. I. I NTRODUCTION The Global Geodynamics Project (GGP) was established to allow Earth scientists the ability to leverage a network of globally distributed instruments for operational and research activities into Earth tides ([1], [2]). Now in its second phase, the GGP is proactively engaging non-traditional disciplines - i.e., those outside its original Earth tides community. For example, Lumb & Aldridge ([3]) seek to better understand Earth’s rotational spectrum at periods of about a half-day, and the potential role of rotationally induced responses in generating and sustaining Earth’s magnetic field. As another example, whose impact is underscored by the recent, devasting 26 December 2004 Sumatra-Andaman earthquake ([4]) and resulting tsunami ([5]), seismologists seek to make use of GGP data to better predict, catalog and interpret seismic activity (e.g., [1]). Even with this compelling interest in GGP data, geodynamicists, seismologists, and others, are faced with practicalities which inhibit their engagement as ‘non- specialists’ ([6]). For example: • Temporal and/or spatial alignment of GGP data is chal- lenging - The requirement to correlate data, in time and space, is currently a very manual process that requires geodynamicists and seismologists to specify temporal (e.g., a period of time, an event in time) and/or spatial (e.g., global, regional, specific instruments) details to allow for further analysis. • There are undesireable, yet significant signals to con- tend with - To geodynamicists and seismologists, tidal, atmospheric, hydrologic and oceanic signals are all un- wanted. This means that the processed GGP data must undergo further, non-trivial reductions before it is useful for geodynamic and seismic purposes. Traceable and reproducible reductions are critical as research efforts often involve complex modeling close to ambient noise levels of the instruments involved. Typically, science employs technology for a purpose. How- ever, when the technology is itself in its infancy, a reciprocity exists - i.e., the scientific use can shape the evolution of the technology. This is precisely the current case with Grid Computing - i.e., intersections between it and (in this case) the GGP have the potential for this reciprocity. Because so much of Grid Computing has been stimulated by the ‘Big Science’ needs ([7]) of High Energy Physics, this is a crucial juncture to motivate requirements from a broader scientific base that represents different disciplines as well as small-to- medium-scale science ([6]). Phrased bluntly, the applicability of Grid Computing in the case of small-to-medium-scale science comprises one of the key drivers for this investigation from the technology perspective. With respect to Grid-enabling the GGP, Lumb & Aldridge ([6]) concluded that: • Leveraging GGP as it exists today is a key consideration - This is especially true for GGP instrumentation and data standards plus bilateral agreements. • There are numerous opportunities for Grid-enabling the GGP - These opportunities range from instrumentation to data to analysis to end users. The purpose here is to initiate progress opposite the first of the scientific motivators - i.e., addressing the challenge of temporal and/or spatial alignment. Addressing this motivator also has the desireable side effect of initiating progress on one of the identified opportunities for Grid-enabling the GGP. This investigation is organized into five sections in addition to this