The Need for Information Loss Metrics in Visualization Aritra Dasgupta * Robert Kosara † 1 I NTRODUCTION Information visualization lacks a sound framework for verification and validation of the established techniques. While we have seen a steadily increasing amount of interest in the real-world about the applicability of different visualization techniques, there is a dearth of metrics that would help in creating a baseline that can be used for comparison among them. The subjectivity of the perceptual space adds complexity to the problem because evaluation of a visualiza- tion technique depends on users’ comprehension. However if we find the intrinsic properties of a visualization system that guides the user comprehension, irrespective of subjective parameters, it will pave the way for quantitative verification and validation and estab- lishing the ground truth. 1.1 Deficit of trust in visualization In comparison with exploratory data analysis techniques in the field of data mining, one drawback of most information visualization techniques is that the user is not necessarily able to trust what he sees on screen [1]. This may sound contradictory because the very goal of visualization is to augment the trust of the user with visual aid and thereby move the analysis forward in an intelligent manner. This factor is often ignored in the current visualization pipeline. While the issue of large data analysis is handled through dimension reduction techniques, what is still an area of open research is how to quantify what is showed on screen, so that the user is not burdened with visual information overload. 1.2 State-of-the-art We find several instances in the literature where researchers have devised qualitative metrics to estimate the quality of the rendered image [4] or the data abstraction [6]. These are important to max- imize the perceptual benefits from the visualization and implicitly deal with information loss. However, there is a stronger motivation for quantitative metrics that describe how the visual structures re- late to the underlying information space. In a nutshell, we should consider visual representation not just as an end product of visual- ization but as the guiding factor for the exploratory analysis of the user. 1.3 Problem of information loss One of the reasons for the deficit of trust, is that the WYSIWYG paradigm does not often hold true, because although the user sees information on screen, he does not know how much of the data- space is being represented and how much is not shown. Because we are dealing with limited pixel-space this becomes a non-trivial issue. Most visualization techniques, thus can be conceptualized as an optimization process which balances the two constraints: the fidelity of the data space and the clarity of the visualization space. One of the ways to address this problem is to quantify the informa- tion content of the visualization. But that is highly subjective and dependent on the users’ perspective, therefore hard to model [5]. A more feasible solution to this problem is to study the problem of * e-mail: adasgupt@uncc.edu † e-mail:rkosara@uncc.edu ? Data Space Information Space Visual Space Perceptual Space Intended information loss Unintended information loss Product of data abstraction, e.g., binning into histograms The missing link: we need concrete quantification of information loss Artifact of limited pixel space, e.g., clutter, over plotting Figure 1: Different types of information loss at different stages of the visualization pipeline. the solid lines represent a strong cou- pling in terms of existing research, while a dotted line signifies weak coupling and need for more research information loss and provide a quantitative analysis as part of the pipeline. 2 QUANITIFYING I NFORMATION LOSS AS A MEANS TO THE ENDS Mapping of billions of data points onto a limited screen space en- tails a loss in information and that is an underlying assumption in visualization, whether explicitly mentioned or not. In this sec- tion we study the variants of information loss and possible applica- tions of controlled information loss. Information loss can be of two types: intended and unintended [8]. In Figure 1 we illustrate where they fit in the visualization pipeline. 2.1 Intended information loss This is encountered mainly in the data-space when large data is ab- stracted to a summarized level so that the aggregated representation is used for visualization. There have been efforts to deal with in- tended information loss, i.e. measure the quality of abstraction at the data level through measures like data abstraction quality [6] and augment the visualization with that abstracted data, but more con- vincing metrics need to be found. 2.2 Unintended information loss This occurs in the screen-space as a result of limited screen space and/or human perception. Many visualization techniques use pan- ning and zooming type interaction techniques to enable the user to overcome perception related information loss. Also there have been efforts to judge the image quality [4]. However we need a concrete judgment of information loss related to different kinds of visualization tasks to be convinced about what data to show to the user without losing out on important information, as well as creat- ing clear visual representations of that data. The inclusion of visual representation as part of the analytic loop is critically important. While intended and unintended information loss have been im- plicitly addressed in the literature, no concrete quantification mech- anism has been proposed so far. As shown in Figure 1 the link be- tween the visual space and perceptual space needs significant inves-