NCDS: data mining for discovering interesting network characteristics M. Zaki a, * , Tarek S. Sobh b a Computer and System Engineering Department, Al-Azhar University, Cairo 12311, Egypt b Information System Department, Egyptian Armed Forces, Cairo, Egypt Received 11 January 2004 Available online 2 October 2004 Abstract This paper presents an approach to observe network characteristics based on data mining framework. Consequently, such observations may be expressed in structured patterns to support the process of network planning. The underlying system monitors the network protocol tables that describe each network connection or host session in order to discover interesting patterns. To achieve this purpose a data abstraction procedure is applied to learn rules that may express the behavior of network characteristics. Thus, the system is capable to discover various operational patterns, provide sensible advices, and support the network planning activity. A database system has been designed and implemented for monitoring the network traffic. Also the results from the experiments have been used to classify real traffic data. The system presented in this paper called network characteristics discovery system. q 2004 Published by Elsevier B.V. Keywords: Data mining; Network management; Abstraction; Pattern discovery 1. Introduction Network management has enormous importance in today’s computing environments. Organizations are increasingly embracing the Internet’s potential as a power- ful, low-cost medium for business transactions such as marketing, advertising, e-commerce, and customer support. Although internetworking offers significant opportunities, it greatly increases the risk of security breaches that render a system unreliable or unusable and its services unavailable [1]. Thus, network management is required to provide a defense against attacks, and to support network planners. Several works [2,6,14] have been concerned with the problem of monitoring the network events. They studied statistical tests, alerts, and correlation between alerts in order to provide a realistic model for network management. Now new techniques are also used for managing networks and distributed systems. With such methods the manage- ment framework is built on the top of a common object request broker architecture (CORBA). Two important commercial products use CORBA as the underlying communications platform: Tivoli and HP OpenView [12]. Network characteristics discovery system (NCDS) is concerned with all the configuration management functions. However to focus attention, accounting and security management are excluded, for the time being. That tool exploits a data-oriented model in which the network management functions are specified as data manipulation statements [16]. Thus a low-level module of the tool reads the values of interesting data items from the relevant tables of TCP/IP protocol and passes them to a relational database. The advice capability is based on data mining for knowledge discovery. Data mining and knowledge dis- covery are often used to refer to an interdisciplinary field, which consists of using methods of several research areas to extract knowledge from real-world data sets [8]. The knowledge discovery process is both iterative and interactive. It is iterative because the output of each step is often feedback to the previous step as shown in Fig. 1, and typically many iterations of this process are necessary to extract high-quality knowledge from data. It is interactive because the user, or more precisely an expert in the application domain, should be involved in this loop, to help 0950-5849/$ - see front matter q 2004 Published by Elsevier B.V. doi:10.1016/j.infsof.2004.08.002 Information and Software Technology 47 (2005) 189–198 www.elsevier.com/locate/infsof * Corresponding author. E-mail addresses: azhar@mailer.scu.eun.eg (M. Zaki), tarek- box2000@arabia.com (T.S. Sobh).