Worm Propagation and Defense over Hyperbolic Graphs Edmond Jonckheere Abstract— We investigate the speed of propagation of worms on the so-called δ-hyperbolic graphs, as defined by Gromov. Such graphs as the physical graph and the logical mail graph manifest this property. The Cayley graphs of combinatorial group theory are used as prototypes of hyperbolic graphs. A simple mail worm defense strategy is considered and the issue as to whether it is able to slow the propagation is addressed. I. INTRODUCTION A worm can be defined to be a malicious piece of code that self-replicates as it self-propagates through a net- work by exploiting software vulnerabilities. The distinction between “worm” and “virus” has its roots in biological epidemiology, where the terminology of “virus” means an organism that consists of a “malicious” DNA or RNA encapsulated in a protein shell and that, as such, is unable to replicate on its own, but will replicate once it has invaded a host cell [4, pp. 20-21]. Likewise, a computer virus needs a program on which it attaches, whereas a worm just propagates on its own. Here, we look at worm propagation as it relates to the topology of the underlying graph that serves as propagation medium. By the definition of this graph, its nodes are ei- ther infected, contagious agents or noninfected, susceptible recipients and its links represent some kind of contacts between nodes that could transmit the pathogenic agent. There are many worms, propagating in different ways, and hence there are many propagation graphs. For such worms as Code-Red choosing their targets by uniform random scanning [11] of the 32 bit address space, the propagation graph would be the Erd¨ os-R´ enyi random G(n, p) graph on n =2 32 nodes with link probability p = ( 2 32 ( 2 32 1 )) 1 , if the scanning were truly random. However, in most cases, the “random” scanning is implemented using a pseudo- random number generator and as such the propagation graph is a traveling salesman path visiting all nodes. In the case of the faulty pseudo-random number generator of the Slammer/Sapphire worm [10], the nature of the propagation graph is less clear. Some simulation of Code-Red v2 have used the Autonomous System (AS) graph where the worm was jumping at random from one AS to another [11]. For such worms as Code-Red II and Nimda, which scan preferentially the local subnet [11], the propagation graph is more towards the physical graph and hence deterministic. For an e-mail worm [12], the graph is the logical e-mail graph. There are two accepted propagation models that somehow refer to the propagation mode: the Epidemiological model E. Jonckheere is with the Department of Electrical Engineering– Systems, University of Southern California, Los Angeles, CA 90089-2563; jonckhee@usc.edu and the Analytical Active Worm Propagation (AAWP) model [2]. The Epidemiological model is a generic model of the spread of a disease, with the basic feature that the propagation speed depends on the fraction of uninfected subjects. The AAWP model is specifically devised for computer worms propagating by random scanning. In this paper, we study a different aspect of the propaga- tion, in the sense that it is more relevant to the topology ori- ented propagation of e-mail worms and peer-to-peer worms. In this case, the underlying graph structure, e.g., the mail logical graph in case of e-mail worm, plays a predominant role. This study is more relevant to such worms as Code Red II and Nimda that preferentially attack neighbors than to worms that randomly choose their targets. Crucial in this study is the epidemiological feature that the propagation depends on the fraction of uninfected machines. The aspects that are investigated here are the specific features of propagation on a hyperbolic graph. Recall that a graph X in which every link has a weight can be made a metric space (X, d), where d is the so-called length dis- tance. A graph (X, d) is δ-hyperbolic if, for every geodesic triangle ABC, every edge, say [AB], is contained in the union of the δ-neighborhoods (δ< ) of the edges [BC] and [CA]. The motivation is that such graphs as the physical graph, the AS graph, and the mail graph are known to be heavy tailed and as such are hyperbolic [8]. We use the Cayley graphs of combinatorial group theory [5, p. 78] as prototype of hyperbolic graphs, since they allow for a complete analytical [13], as opposed to simulation, study of propagation and defense strategies. One problem is that heavy tailed graphs by their very definition have high disparity among the degrees their nodes, whereas Cayley graphs have near homogeneous node degree. However, they both share the visually intuitive hyperbolic property that they are close to trees in the sense of large scale geometry [6, Appendix B]. The δ-hyperbolic concept of course made sense only for idealized infinite graphs. How it should be amended to deal with very large but finite graphs has been addressed in [8], but we will not consider this finite case here. II. EPIDEMIOLOGICAL MODEL Qualitatively, the epidemiological model is a homoge- neous mode of propagation where any contagious agent could transmit the pathogen to any other susceptible re- cipient with uniform probability p. This is also sometimes referred to as homogeneous mixing model. Since any in- fected node in a vulnerable population n could potentially infect any uninfected node with probability p, it can be said that the propagation occurs on a G(n, p) random