Understanding Individual Nodes in Peer-to-peer Systems Gyuwon Song †‡ , Suhyun Kim , Sunghwan Jang †‡ , and Daeil Seo †‡ Human Computer Interaction and Robotics Department , Imaging Media Center University of Science and Technology , Korea Institute of Science and Technology Seoul, Korea {sharp81, suhyun.kim, jangc, xdesktop}@imrc.kist.re.kr Abstract—This paper addresses a simple question: Is there a behavior pattern of an individual node in peer-to-peer systems? It is known that peer-to-peer systems collectively show daily patterns, but the behavior patterns of individual nodes have seldom been studied. If individual nodes have their own behavior patterns with reasonable accuracy, we could greatly improve the efficiency of the system by reducing the overhead to handle unexpected random node failures. Even though there have been many empirical studies on peer- to-peer systems, most of them are focusing on a collective view to characterize the whole systems in terms of average availability, average session length, and so on. In this paper we present Peer Availability Table (PAT) which is a model to represent a behavior pattern of an individual node based on the measurement of node availability. To judge the existence of the behavior pattern, we measure the performance of PAT using a binary classification test. By answering the basic question, we provide a useful hint for the design of peer-to-peer systems. Index Terms—Peer-to-peer system, behavior pattern, peer model, availability, KAD I. I NTRODUCTION Peer-to-peer (P2P) systems provide high scalability and reli- ability by using peers donated resources, including computing power, network bandwidth, and disk space. Many attractive services have emerged by exploiting it. P2P systems have a significant overheads to manage unexpected node failures, since the fundamental nature of the systems: peers can join and leave at any time without any notice. However, to find out a behavior pattern of an individual node can bring high benefits for reducing these costs. For example, reducing the number of redundant replicas for distributed storage system, the frequency of refreshing time to content-publish for file sharing system, and so on. First question is, ’Is there a behavior pattern of an individual node in P2P system?’. If so, how to model the pattern and use it to predict node behavior? We will answer this question. Though it is very important to know that the analysis of individual nodes is inadequate compare to the collective view of nodes. The findings about collective nodes had revealed that there are a diurnal pattern and weekly pattern based on the variation of its population [1], [2], [3], [4]. However these methods are too general to predict a node behavior. In this paper we emphasize that whether a behavior pattern of an individual node does exist or not. To judge the existence of it, we present a novel technique which is a using a node availability to represent a behavior pattern of an individual node, and a test tool to evaluate the accuracy. Namely, for each node, its behavior pattern is modeled as PAT by analyzing and normalizing its trace data. To show the presence of the behavior pattern, we compare the trace data and its PAT via a binary classification test. The rest of this paper is organized as follows. Section II de- scribes related works, and then we briefly introduce the KAD trace what we make full use of it. In Section IV, our model to represent the behavior pattern and its verification method are explained. After the analysis of results, we summarize this paper in Section VI. II. RELATED WORK To analyze the behavior pattern of nodes, a measurement of P2P system should be done first. The measurement techniques fall into two categories: active and passive probing. Passive ways are constrained to measure a small set of controlled peers to study the traffic pattern and peer dynamics. While, most uses an active network probing method to detect availability of nodes. Steiner et al. studied KAD trace by a global view and peer view [4]. Moreover they generously provide its raw trace data, so we can exploit it in our research. Measurements [5], [6], [7], [8], [9], [10] characterize the P2P file sharing system traffic over the Internet, including Napster, Gnutella, KaZaa, and BitTorrent systems. Not only P2P system, but Bolosky et al. described the host availability of over 50,000 PCs belonging to the Microsoft [11] and Simache et al. studied a UNIX workstation in a distributed environment [12]. But they mainly focused on the collective analysis of those systems not the individual nodes. Douceur performed a meta-analysis of availability data [1], examining the Microsoft, Gnutella, and Napster traces and revealed two broad pattern of node availability; first, those who are always online, whereas those in the second have diurnal pattern. Bhagwan et al. studied nodes in the Overnet DHT and found diurnal patterns [2], and Tian et al. [3] [13] studied the dynamic pattern of the Maze system. Additionally Steiner et al. discovered weekly pattern in the KAD [4]. A few models of behavior pattern, which is based on the conservative analysis, had typically shown patterns using static host availability. Douceur et al. expressed an availability of node in terms of its fractional downtime [14]. Bhagwan et al.