Stat The ISI’s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000 ................................................................................................. Metrics for SiZer Map Comparison Jan Hannig a∗ , Thomas C. M. Lee b , Cheolwoo Park c Received 00 Month 2012; Accepted 00 Month 2012 SiZer is a powerful visualization tool for uncovering real structures masked in noisy data. It produces a two- dimensional plot, the so-called SiZer map, to help the data analyst to carry out this task. Since its first proposal, many different extensions and improvements have been developed, including robust SiZer, quantile SiZer, and various SiZers for time series data, just to name a few. Given these many SiZer variants, one important question is, how can one evaluate the quality of a SiZer map produced by any one of these variants? The primary goal of this article aims to answer this question by proposing two metrics for quantifying the discrepancy between any two SiZer maps. With such metrics, one can systematically calculate the distance between a “true” SiZer map and a SiZer map produced by any one of the SiZer variants. Consequently, one can select a “best” SiZer variant for the problem at hand by selecting the variant that produces SiZer maps that are, on average, closest to the “true” SiZer map. Copyright c 2012 John Wiley & Sons, Ltd. Keywords: Baddeley’s delta metric, discrepancy measure, graphical data analysis, image metric, multiscale methods, oracle SiZer map .................................................................................................. 1. Introduction The SiZer methodology of Chaudhuri & Marron (1999, 2000) and Hannig & Marron (2006) is a powerful visualization tool for exploring structures hidden in noisy data. It produces a graphical construct, termed the SiZer map, that aids the data analyst to isolate the significant structures from those spurious features that are due to sampling noise. In the univariate setting, a SiZer map is a two-dimensional image that summarizes the results of a sequence of hypothesis tests. These tests are performed to test if locally the estimated slopes of the underlying regression function or density function are significantly increasing, decreasing, or neither. These tests are also done with slopes estimated at different scales (i.e., resolutions). The rationale behind is that, if all estimated slopes (at different scales) to the left of location x are significantly decreasing while all estimated slopes to the right of x are significantly increasing, then it provides strong evidence that there is a significant trough located at x in the underlying function. A more formal description of SiZer is given in the next section. .................................................................................................. a Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, NC 27599-3260, U.S.A. b Department of Statistics, University of California at Davis, CA 95616, U.S.A. c Department of Statistics, University of Georgia, Athens, GA 30602-7952, U.S.A. ∗ Email: jan.hannig@unc.edu .................................................................................................. Stat 2012, 00 1–12 1 Copyright c 2012 John Wiley & Sons, Ltd. Prepared using staauth.cls [Version: 2012/05/12 v1.00]