The 2008 Artificial Intelligence Competition Data: Source and Characteristics Kimberly L. Elmore and Michael B. Richman 1. Introduction The training and testing data used in the 2008 AMS AI competition come form the WInter Hydrometeor Ground Truth Experiment (WHCGT), recently renamed to the Winter Precipitation Identi- fication Near the Ground, or W-PING, Project. This experiment began in the Winter of 2006- 2007 as a way to approach the problem of devel- oping a Winter Hydrometeor Classification Algo- rithm (HCA) for the KOUN Polarimetric testbed radar (Scharfenberg et al. 2005). Prior to this, the existing HCA had been devel- oped with only warm season convection in mind. There was some question as to how well this HCA would perform in cold season applications. Thus the WHCGT (now W-PING) project was conceived. 2. The Data To determine how well the NSSL HCA (Scharfenberg, et al. 2005) performed in cold sea- son precipitation, an experiment was launched that uses the public as observers of precipitation type. Based solely upon press releases and newscasts, the WHCGT Experiment was initiated in November 2006. The idea was to collect public observations of precipitation type and enter those observations into a data based using a secure web form. Fortunately, the Winter of 2006-2007 proved very active, with an unusually high frequency of ice and snow events. In Oklahoma. The Oklahoma public is usually attuned to weather events and while Winter weather is no novelty in Oklahoma, it is rare enough that it sparks a great deal of public interest when it occurs. The WHCGT Experiment was launched on the eve of a well-forecast and heavily publicized winter storm and the public response was exceptionally high. Competition data come from three major events: 29 Nov through 30 Nov 2006, 11 Jan through 14 Jan 2007, and 19 Jan through 20 Jan 2007. A web site contains information about the experiment, guidance about how to distinguish var- ious winter precipitation types, a status message, and a web form that could be filled in to provide observations of precipitation type. That page may be found at http://www.nssl.noaa.gov/projects/win- ter/. There was no need for the public observers to “sign up” and, in fact, all of the provided information was purposely kept anonymous. The public was asked to distinguish between the following categories: rain, drizzle, freezing rain, freezing drizzle, ice pellets (sleet), graupel, snow, hail, and none, all within a 150 km radius from the KOUN radar. As a practical matter, a cold-season HCA must be able to distinguish between frozen, liquid, and no precipitation, so the above catego- ries were amalgamated into the three used in the competition. Freezing rain and freezing drizzle were combined with rain and drizzle, and classed as “liquid.” Snow, ice pellets (sleet), graupel and hail were all combined into “frozen,” while “none” was retained as is. The observed precipitation type data are qual- ity controlled using rather broad criteria. If an observation is clearly inconsistent with nearby observation in time and space, e.g., observations of “hail” in the midst of “snow” are removed. Obser- vations well outside of the project area have been removed as have been obvious duplicate entries. The KOUN polarimetric testbed radar operated during most events. The KOUN radar differs from standard weather radar in that it transmits in both horizontal and vertical polarization; standard weather radars use only horizontal polarization. KOUN collects the familiar standard radar parame- ters – horizontal reflectivity, Z h , and radial velocity V r – along with differential reflectivity, Z dr , differen- tial phase shift, φ dp , specific differential phase shift, k dp (the radial derivative of φ dp and so independent of the initial phase shift), and correlation coefficient between horizontal and vertical polarization reflec- tivity, ρ hv . Each of these parameters are affected id different ways by the nature of the hydrometeors that scatter the radiation back to the radar receiver. Among the things that affect the returned sig- nal are the shapes of the hydrometeors and their composition (whether liquid or ice) and their den- sity. Thus the composition and 2-dimensional size distribution, along with number concentration all define the polarimetric variables observed by the radar. Around each ground observation, radar data for each parameter is averaged over a 5 x 5 (range by azimuth) kernel centered on each ground obser- vation. Only observations associated with radar data between 0.3 km and 1.2 km AGL are used. Within that height range, only the lowest scan is chosen. All data are filtered to remove observa- tions within ground clutter. For the three main events, about 2650 obser- vations were logged. After the rudimentary QC,