13.8 INTELLIGENT THINNING ALGORITHM FOR EARTH SYSTEM NUMERICAL MODEL RESEARCH AND APPLICATION Rahul Ramachandran * , Xiang Li, Sunil Movva, Sara Graves University of Alabama in Huntsville Steve Greco, Dave Emmitt Simpson Weather Associates Joe Terry Science Applications International Corporation Robert Atlas Goddard Space Flight Center, NASA 1. INTRODUCTION As the number of observation platforms and numerical models on all scales continues to rapidly increase, the large amounts of data that they generate threaten to hamper the progress in scientific research and forecast improvement, or even overwhelm (both in time and data-space) the researchers and modelers that use them. The NASA Earth Science Enterprise (ESE) strategy and mission documents specifically note that one of the important challenges facing NASA is that of transforming vast quantities of data and information into products that can be beneficial to users, especially for economic and policy decision making. Pertinent examples of these products would be model-generated forecasts such as hurricane landfall, air quality, 3-day and 7-day weather forecasts. While the assimilation of more and better observations is necessary for forecast improvement, today’s global models and data assimilation systems cannot ingest and utilize all of the data available to them due to extremely large computational costs and constrained network bandwidth. This is not only a result of the data volume, but also due to issues in dealing with the potential impact of each additional observation type, as well as differences between model grid size and the data grid/density. For example, typical observational data has a horizontal resolution of 25km or better. In addition, in regions where there is an overlap of orbital paths from different satellites the combined data density is much higher. The crudeness of current techniques used to thin such data to manageable densities indicates the lack of viable methods for dealing with the complex problem of extracting the information-dense data that provides the best representation of the atmosphere. ___________________________________________ * Corresponding Author Address: Rahul Ramachandran, Information Technology and Systems Center, University of Alabama in Huntsville, Huntsville, AL 35899; email: rramachandran@itsc.uah.edu This paper describes the development and testing of an automated Intelligent Data Thinning (IDT) algorithm to facilitate improved data assimilation schemes and forecast accuracies by preserving information-dense regions while removing redundant data points. The development of this algorithm is a collaborative effort involving a team of data mining experts at the University of Alabama in Huntsville (UAH), numerical modelers at Goddard Space Flight Center (GSFC) and Simpson Weather Associates (SWA) . 2. BACKGROUND As space-based observing systems generate ever increasing volumes of data, there arises the need to better discriminate between useful data points and data points which are simply redundant. Data Assimilation Systems (DAS) processing times can increase by as much as the square of the number of observations, depending on the assimilation method used. To circumvent this problem, most operational centers must resort to using very crude thinning methods to reduce data volume. Some of the existing data thinning techniques used in the Earth science modeling and research community include superobing, using a single observation per grid box, using observations closest to the model first guess, and using observations furthest from the model first guess. Superobing is a regional averaging technique using a simple weighting scheme. Data thinning using a single observation per grid box is similar, with the observation closest to the center of each box of a user-defined global grid kept for assimilation. In the data thinning approach using observations closest to the model first guess, only the observation minus model first guess values that are smallest and within a prescribed threshold are kept for assimilation. In other words, more weight is given to the model first guess and less to the observations. With data thinning approaches using observations furthest from the model first guess, only the observation minus model first guess values that are largest and within a prescribed threshold are kept for assimilation. With each of these techniques, the number of observations can be reduced significantly but at the cost of considerable loss of atmospheric structure and information. Therefore, these methods eliminate large amounts of data, some useful,