Internet Path Behavior Prediction via Data Mining: Conceptual Framework and Case Study Leszek Borzemski (Wroclaw University of Technology Wroclaw, Poland leszek.borzemski@pwr.wroc.pl) Abstract: In this paper we propose an application of data mining methods in the prediction of the availability and performance of Internet paths. We deploy a general decision-making method for advising the users in further usage of Internet path at particular time and date. The method is based on the clustering and tree classification data mining techniques. The usefulness of our method for prediction the Internet path behavior has been confirmed in real-life experiment. The active Internet measurements were performed to gather the end-to-end latency and packet routing information. The knowledge gathered has been analyzed using a professional data mining package via neural clustering and decision tree algorithms. The results show that the data mining can be efficiently used for the purpose of the forecasting the network behavior. We propose to build a network performance monitoring and prediction service based on proposed data mining procedure. We address our approach especially to the non-networkers of such networking frameworks as Grid and overlay networks who want to schedule their network activity but who want to be left free from networking issues to concentrate on their work. Keywords: Grids, Network Behavior Prediction, Knowledge Management, Data Mining, Internet Performance, End-to-end-performance Categories: C.2.3, C.4, H.1.2, H.2.8 1 Introduction Today’s Internet users perceive good network operation by low latency, high throughput and high availability. Network performance is usually evaluated by the available bandwidth, end-to-end latency, and throughput of data transfers. But it has never been easy to determine whether slow responses are due to either network or end system on both sides. Moreover, we are not able exactly to diagnose and isolate key sources of Internet communication problems because they may be localized in various network appliances and at different communication layers. Even though the best effort networking is enough for Internet based applications that are usually employing stateless communication, many of new application with the end-to-end nature now require predictable network performance. With the advent of Grids [Avery and Foster 2001], overlay [Andersen et al. 2001] and peer-to-peer [Gnutella 2005] networks, the network behavior prediction issue becomes an essential task. The predictions can be used to schedule application/user activity and network communication, to select servers or network routes, as well as to organize parallel downloads. Various network performance evaluation principles and practices are used in contemporary Internet. Several efforts in network performance measurement and monitoring concepts, tools and projects are reported at CAIDA’s [CAIDA 2006] and Journal of Universal Computer Science, vol. 13, no. 2 (2007), 287-316 submitted: 31/7/06, accepted: 15/1/07, appeared: 28/2/07 J.UCS