Topographical Proximity: Exploiting Domain Knowledge for Sequential Data Mining Ann Devitt Ericsson Dublin 4 Ireland ann.devitt@ericsson.com Joseph Duffin Ericsson Dublin 4 Ireland joseph.duffin@ericsson.com Abstract In today’s mobile telecommunications networks, in- creasingly powerful fault management systems are re- quired to ensure robustness and quality of service of the network. In this context, fault alarm correlation is of prime importance to extract meaningful information from the vast quantities of alarms generated by the net- work. Existing sequential data mining techniques ad- dress the task of identifying possible correlations in fre- quent sequences of telecoms alarms. These frequent se- quence sets, however, may contain sequences which are not plausible from the point of view of network topol- ogy constraints. This paper presents the Topographical Proximity (TP) approach which exploits the topograph- ical information encoded in telecommunication alarms in order to address this lack of plausibility in mined alarm sequences. An evaluation of the quality of mined sequences is presented and discussed. Results show an improvement in overall system performance for impos- ing proximity constraints. 1 Introduction Given the growing complexity of mobile telecommu- nications networks, the task of ensuring robustness and maintaining quality of service in the network requires increasingly powerful network management systems. Furthermore, the steady increase in size and complex- ity of the network produces a corresponding increase in the volume of data generated by network elements (e.g. alarms, performance indicators) placing added strain on management systems. In particular, the area of fault management remains a key problem area for network operators, as the speed at which faults are handled has very immediate consequences for network performance. The complex, inter-connected nature of the network means that a single fault may produce a cascade of alarms from affected network elements. Conversely, intermittent, self-clearing alarms may be raised without any attendant fault in the network. In this context, event correlation provides a means of deal- ing with the large volume of alarm data. Correlations define relations between alarm events that facilitate the processes of alarm filtering, masking and prioritising specified in ITU-T recommendations [7]. While se- quential data-mining techniques have evolved to iden- tify possible useful correlations in alarm data, the task of identifying the subset of important and plausible correlations remains heavily dependent on the domain expertise of network equipment manufacturers and op- erators. Yet alarms encode substantial domain knowl- edge, in particular topographical information regarding the network elements which generated a given alarm. Furthermore, telecommunications networks, although complex, conform to a well-defined topology of net- work elements. This paper addresses the challenge of harnessing the latent domain knowledge available in alarm data in order to provide criteria for automati- cally evaluating the plausibility of mined alarm corre- lations. Section 2 sets out current approaches in the domain of sequential data-mining addressing the task of event correlation. Section 3 describes the need to exploit topographical attributes of the input data to validate mined sequences and how this has been re- alised for telecommunications alarm data as the Topo- graphical Proximity (TP) measure. Section 4 describes a set of experiments aimed at providing a qualitative evaluation of the topographical proximity approach for mining telecommunications alarm data. The results are presented and discussed in section 5.