Spatio-temporal Association Mining for Un-sampled Sites Dan Li, Jitender Deogun Department of Computer Science and Engineering University of Nebraska-Lincoln, Lincoln NE 68588-0115 Abstract. In this paper, we investigate interpolation methods that are suitable for discovering spatio-temporal association rules for unsampled points with an initial focus on drought risk management. For drought risk management, raw weather data is collected, converted to various indices, and then mined for as- sociation rules. To generate association rules for unsampled sites, interpolation methods can be applied at any stage of this data mining process. We develop and integrate three interpolation models into our association rule mining algorithm. The performance of these three models is experimentally evaluated comparing interpolated association rules with rules discovered from actual raw data. 1 Introduction Nationally, drought events are the dominant cause of crop loss. We are in the process of developing a Geo-spatial Decision Support System (GDSS) that provides farmers and government agencies with critical ongoing information for drought risk management [1]. One of our project objectives is to discover interpretable patterns and rules associated with ocean parameters, atmospheric indices and climatic data. These rules capture the influence of ocean parameters (e.g, Multivariate ENSO Index (MEI)) upon climatic and drought indices (e.g, Standardized Precipitation Index (SPI), Palmer Drought Severity Index (PDSI)). Association rule mining, one of the most important Knowledge Discovery in Databases (KDD) techniques in data mining, is applied to meet our goal. We collect raw weather data and oceanic indices from a variety of sources, e.g, pre- cipitation and temperature data from High Plains Regional Climate Center (HPRCC), and the MEI from Climate Prediction Center. However, cost and technical consider- ations do not allow data to be sampled at all points in a region, therefore, spatial interpolation has been widely used in geographical information systems (GIS). It has the potential to find the functions that best represent the entire area. Such functions predict data values at unsampled points given a set of spatial data at sample points. We develop three interpolation models to discover association rules for unsampled sites. We apply Leave-One-Out (LOO) cross-validation to estimate errors generated by the interpolation models. The analysis and evaluation of the three proposed models is based on two quality metrics, precision and recall, and on the comparison of the interpolated rules with the rules discovered from actual data. 2 Spatio-temporal Data Mining & Interpolation Concepts A typical association rule has the Environmental Data Temperature, Precipitation, Soil Moisture, Sea Surface Temperature, Atmospheric Pressure …… Stage 1 Stage 2 Stage 3 Oceanic Indices SOI, NAO, PDO, ... Drought Indices SPI, PDSI MEI ed, PDO ed => PDSI ed, SPI12 sd Raw Data Oceanic and Climatic Indices Association Rules REAR algorithm Transform & Calculation Fig. 1. A conceptual model of mining process. form of X Y , where X is antecedent episode, Y is the consequent episode, and X Y = . Here, an episode is a collection of events occurring close enough in time. In an earlier paper, we present REAR algorithm [1] which allows the user to efficiently discover user specified target episodes. Rules generated by the REAR demonstrate the importance and potential use of the spatio-temporal data mining algorithms in monitoring drought using the oceanic and atmospheric indices. A conceptual model of the three-step data mining process is given in Figure 1. We use historical and current climatology data, including precipitation data, atmo- spheric pressure data, and sea surface temperatures collected at various weather sta- tions around the world. Drought indices, e.g. SPI and PDSI, are calculated based on these raw datasets. The REAR algorithm is used to find relationships between the This research was supported in part by NSF Digital Government Grant No. EIA-0091530, USDA RMA Grant NO. 02IE08310228, and NSF EPSCOR, Grant No. EPS-0091900.