The Application of Clustering to Earth Science Data: Progress and Challenges Michael Steinbach * Pang-Ning Tan † Shyam Boriah * Vipin Kumar * Steven Klooster ‡ Christopher Potter § 1 Introduction The work described in this paper was conducted as part of the NASA funded project, Dis- covery of Changes from the Global Carbon Cycle and Climate System Using Data Mining, which was part of the Intelligent Systems (NRA2-37143) program. The goal of this project was to better understand global scale patterns in biosphere processes, especially relationships between the global carbon cycle and the climate system. During this project, we developed new data analysis and knowledge discovery techniques to investigate changes in the global carbon cycle and climate system. This research has resulted in numerous joint publications in archival journals and major conferences [4, 10, 17–21, 23–28, 31–34], as well as two NASA press releases [14, 15]. More specifically, in this paper, we describe a novel clustering technique that we developed to identify regions of uniform behavior in spatio-temporal data. The clusters produced by this method are useful in discovering climate indices 1 because they identify significant regions of the ocean or atmosphere where the behavior is relatively uniform over the entire area. Some of the discovered clusters correspond to known climate indices, while other clusters are variants of known indices that appear to provide better predictive power for some land areas, and still other clusters may represent potentially new Earth science phenomena. Although this application of clustering to Earth science data has proven useful, many challenges remain. After a quick description of the data and our clustering work, we briefly describe one of these challenges, namely, the need for clusters that can represent dynamic phenomena such as those associated with climate indices. 2 Earth Science Data The types of data shown in Figure 1 are representative of the data considered in this project, i.e., the basic data elements are individual co-registered cells in grids that cover the entire sur- face of the earth with resolutions between 0.25 km and 50 km. (Land * Department of Computer Science and Engineering, University of Minnesota, {steinbac, sboriah, kumar}@cs.umn.edu. † Department of Computer Science and Engineering, Michigan State University, ptan@cse.msu.edu ‡ California State University, Monterey Bay, sklooster@gaia.arc.nasa.gov § NASA Ames Research Center, cpotter@mail.arc.nasa.gov 1 To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. 1