Identifying and Explaining Map Imperfections Through Knowledge Provenance Visualization Nicholas Del Rio University of Texas at El Paso Computer Science, 500 W. University Ave. El Paso, TX ndel2@utep.edu Paulo Pinheiro da Silva University of Texas at El Paso Computer Science, 500 W. University Ave. El Paso, TX paulo@utep.edu ABSTRACT Applications deployed on cyber-infrastructures often rely on multiple data sources and distributed compute resources to access, process, and derive results. When application re- sults are maps, it is possible that non-intentional imperfec- tions can get introduced into the map generation processes because of several reasons including the use of low quality datasets, use of data ﬁltering techniques incompatible for the kind of map to be generated, or even the use of inappropriate mapping parameters, e.g., low-resolution gridding parame- ters. Without some means for accessing and visualizing the provenance associated with map generation processes, i.e., metadata about information sources and methods used to derive the map, it may be impossible for most scientists to discern whether or not a map is of a required quality. Probe-It! is a tool that provides provenance visualiza- tion for results from cyber-infrastructure-based applications including maps. In this paper, we describe a quantitative user study on how Probe-It! can help scientists discriminate between quality maps and maps with known imperfections. The study had the participation of ﬁfteen active scientists from ﬁve domains with diﬀerent levels of expertise with re- gards to gravity data and GIS. The study demonstrates that a very small percentage of the scientists can identify imper- fections using maps without the help of knowledge prove- nance. The study also demonstrates that most scientists, whether GIS experts, subject matter experts (i.e., experts on gravity data maps) or not, can identify and explain sev- eral kinds of map imperfections when using maps together with knowledge provenance visualization. Categories and Subject Descriptors D.2.5 [Software Engineering]: testing and Debugging— debugging aids, diagnostics, tracing ; H.5 [Information In- terfaces and Presentation]: General Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WOODSTOCK ’97 El Paso, Texas USA Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. General Terms knowledge provenance visualization Keywords knowledge provenance, maps, cyberinfrastructure 1. INTRODUCTION The use of maps is becoming more pervasive as geograph- ical information system (GIS) technologies succeed in their goal of providing users with easier ways of accessing, com- bining and visualizing geo-spatial data. The commercial success of products like Google Earth and Microsoft Vir- tual Earth demonstrates that the use of maps can and will keep increasing in the future. Of particular interest in sci- ence is the generation of maps from the combined use of GIS technology and more readily available data provided by cyber-infrastructure communities [2] such as National Sci- ence Foundation (NSF) funded Geosciences Network (GEON) [1] and Circumarctic Environmental Observatories Network (CEON) [5, 6]. Scientists, who are not necessarily GIS ex- perts, can now use their data along with data provided by these and many other cyber-infrastructure communities to create maps on demand. Maps, however, as any scientiﬁc product, are subject to imperfections, and most imperfec- tions are too subtle to be identiﬁed by scientists whether they are subject matter experts (SME) (with respect to data used to generate maps), GIS experts, or just ordinary sci- entists with a speciﬁc need for a given map. For example, maps may be inaccurate because of: a faulty sensor in a collection of thousands of sensors used to generate a large geo-spatial dataset; incompatible ways of reading and stor- ing measured geo-spatial data; services used to derive maps that are incompatible when combined; or even because of inappropriate use of parameters for any of the services used to derive a map. GIS and cyber-infrastructure, thus, may provide a context for the creation and proliferation of maps that one could label as inaccurate if one could know more about how they were generated. Knowledge provenance (KP) is meta-information about how products, which can be maps, are generated. KP of- ten includes meta-information about the following: orig- inal datasets used to derive products; executions of pro- cesses, i.e., traces of workﬂow executions and composite ser- vices execution; methods called by workﬂows and compos- ite services, i.e., services, tools, and applications; intermedi- ate datasets generated during process executions; and any other information sources used. In a GIS context, knowledge