Eigenspace-based Anomaly Detection in Computer Systems Tsuyoshi ID ´ E * Tokyo Research Laboratory IBM Research goodidea@jp.ibm.com Hisashi KASHIMA Tokyo Research Laboratory IBM Research hkashima@jp.ibm.com ABSTRACT We report on an automated runtime anomaly detection method at the application layer of multi-node computer systems. Al- though several network management systems are available in the market, none of them have sufficient capabilities to detect faults in multi-tier Web-based systems with redun- dancy. We model a Web-based system as a weighted graph, where each node represents a “service” and each edge repre- sents a dependency between services. Since the edge weights vary greatly over time, the problem we address is that of anomaly detection from a time sequence of graphs. In our method, we first extract a feature vector from the adjacency matrix that represents the activities of all of the services. The heart of our method is to use the principal eigenvector of the eigenclusters of the graph. Then we derive a probability distribution for an anomaly measure defined for a time-series of directional data derived from the graph sequence. Given a critical probability, the threshold value is adaptively updated using a novel online algorithm. We demonstrate that a fault in a Web application can be automatically detected and the faulty services are identi- fied without using detailed knowledge of the behavior of the system. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning; H.2.8 [Database Management]: Database applications - Data Mining; K.6.4 [Management of Computing and Information Sys- tems]: System Management General Terms Algorithms, Management * The authors address: 1623-14, Shimotsuruma, Yamato, Kanagawa 242-8502, Japan. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’04, August 22–25, 2004, Seattle, Washington, USA. Copyright 2004 ACM 1-58113-888-1/04/0008 ...$5.00. Keywords time sequence of graphs, principal eigenvector, Perron-Frobenius theorem, von Mises-Fisher distribution, singular value de- composition 1. INTRODUCTION 1.1 Anomaly detection from graph sequences Network systems having various connections and corre- lations between vertices have attracted much attention in several research fields such as ecology, economics, and solid- state physics. In the data mining community, growing atten- tion is being paid to graphs as a new data structure. Recent studies include: an extension of the a-priori algorithm to graphs [13], clustering graph vertices based on graph spec- tra [5, 4], and anomaly detection from a graph set based on the maximum description length principle [16]. Refer- ence [22] extensively reviews the state of the art of graph- based data mining. Most of those works today, however, assume that the attributes of graphs are static. On an abstract level, computer systems are also repre- sented as graphs. What is profound here is that, first, it is possible to define various kinds of network structures. For example, one can consider several structures at each layer of the OSI (Open Systems Interconnection) reference model. Second, interactions between vertices or edge weights are not clearly defined at each layer. In this paper, we address online anomaly detection for computer systems. We model a Web-based system as a weighted graph, where each node represents a “service” and each edge represents a dependency between services. Since edge weights may vary over time, the problem we address is that of anomaly detection from a time sequence of graphs, namely from a time-dependent adjacency matrix. The de- pendency matrix will exhibit some change when a monitored system experiences a fault. This change, however, will be difficult to detect by monitoring an individual dependency, i.e., each matrix element. This is especially true in Web- based systems, where the number of service calls fluctuates strongly over time. Even if a detector observes a sudden change in a single service, there is no evidence to conclude whether or not it is due to a fault. It may just be a fluctu- ation in traffic. This is a new challenge for graph mining, where one needs to discover an unknown structure hidden deep inside of de- pendency graphs, and detect faults from their anomalous changes. In this paper, we discuss the dynamics of graphs in 440 Industry/Government Track Paper