ReDSOM: Relative Density Visualization of Temporal Changes in Cluster Structures using Self-Organizing Maps Denny Department of Computer Science, The Australian National University, Canberra, Australia Faculty of Computer Science, University of Indonesia, Indonesia denny@cs.anu.edu.au, denny@cs.ui.ac.id Graham J. Williams Australian Taxation Office, Canberra, Australia graham.williams@ato.gov.au Peter Christen Department of Computer Science, The Australian National University, Canberra, Australia peter.christen@anu.edu.au Abstract We introduce a Self-Organizing Map (SOM) based visu- alization method that compares cluster structures in tem- poral datasets using Relative Density SOM (ReDSOM) visualization. Our method, combined with a distance matrix-based visualization, is capable of visually identi- fying emerging clusters, disappearing clusters, enlarging clusters, contracting clusters, the shifting of cluster cen- troids, and changes in cluster density. For example, when a region in a SOM becomes significantly more dense com- pared to an earlier SOM, and well separated from other regions, then the new region can be said to represent a new cluster. The capabilities of ReDSOM are demonstrated us- ing synthetic datasets, as well as real-life datasets from the World Bank and the Australian Taxation Office. The results on the real-life datasets demonstrate that changes identified interactively can be related to actual changes. The identi- fication of such cluster changes is important in many con- texts, including the exploration of changes in population be- havior in the context of compliance and fraud in taxation. 1. Introduction Businesses and government organizations need knowl- edge of change in order to adapt their strategies to ever- changing environments. Knowing what has changed can be a major competitive advantage for an organization. To un- derstand what has changed, analysts have to be able to relate new knowledge or models acquired from a newer dataset to those acquired from an earlier dataset. Without this con- text, it can be difficult to revise existing strategies. This is particularly problematic if an organization has already im- plemented a strategy based on an earlier model. In supervised learning, classifier performance often de- grades over time, an issue known as concept drift [19, 23]. In many real-life domains, a concept of interest may de- pend on some hidden context, which is not given explicitly in the form of predictive features (i.e. some variables are invisible to the learner). For example, such hidden concepts can be changes in economic policy, disasters, life events, or changes in marketing strategies. Changes in the hidden context can induce more or less radical changes in a target concept. Most research in concept drift only addresses con- cept drift in a supervised learning context—little has been researched in the context of unsupervised learning. In data mining of conceptual changes, a number of tem- poral data mining algorithms have focused on detecting the point in time when something has changed (change detec- tion), rather than understanding or exploring the causes that have made the changes (change analysis). For example, by gradually eliminating the effects of past data, an on-line discounting learning algorithm can detect outliers and the change points in time in a changing data source [25]. To discover changes between two datasets, the resulting data mining models can be compared, given that a data min- ing model is designed to capture specific characteristics of a dataset. A theoretical framework has been introduced in [7] that allows measuring changes between two models. In this