Proceedings of the 2003 Systems and Information Engineering Design Symposium Matthew H. Jones, Barbara E. Tawney, and K. Preston White, Jr., eds. 13 ABSTRACT Incident reporting is becoming increasingly important in large organizations. Legislation is progressively being in- troduced to deal with this information. One example is the European Directive No. 94/95/EC, which obliges airlines and national bodies to collect and collate reports of inci- dents. Typically these organizations use manual files and standard databases to store and retrieve incident reports. However, research has established that database technol- ogy needs to be enhanced in order to deal with incidents. This paper describes the design and implementation of In- Ret, an incident report retrieval system that endeavours to find similarities and patterns between incidents by combin- ing the strengths of Case-Based Reasoning and Informa- tion Retrieval techniques in an integrated system. Prelimi- nary results from InRet are presented and are encouraging. 1 INTRODUCTION The use of incident reporting systems is increasingly being recognized as an effective way of analyzing incidents, leading to the anticipation and preclusion of further inci- dents and accidents. An incident can be defined as "an un- wanted disruption that has unfortunate and untoward con- sequences" (Perrow, 1999). Groups of incidents may be related in subtle and unremarkable ways. Such relation- ships may not be detectable using standard database que- ries, hence the need for advanced incident management systems (Johnson, 2000). The goal of incident reporting and analysis is to determine the links between incidents in order to prevent recurrences. This requires a technology that can extract useful information from large amounts of accumulated data. The purpose of data mining is to look for hidden pat- terns in a group of data that can be used to predict future behaviour. It involves sorting through data in order to iden- tify associations and to establish relationships. InRet uses data mining in order to detect such relationships between incidents. It does so using Case-Based Reasoning and In- formation Retrieval techniques. 1.1 Introduction to Case-Based Reasoning Case-based reasoning (CBR) is an Artificial Intelligence problem-solving technique that solves new problems by reusing existing problem solutions stored in the form of cases in a case-base (Leake, 1996). The CBR process can be described in four main stages: 1. Retrieve: A new problem is compared/matched with the cases in the case-base to retrieve the most similar cases; 2. Reuse: The solution(s) to the retrieved case(s) is reused to provide a solution to the new problem; 3. Revise: Unless the retrieved solution(s) is an iden- tical match, it may have to be revised; 4. Retain: The new case, incorporating both problem and solution, is retained in the case-base. Each case is represented by a set of features, or attrib- utes, which are used in the matching process. Depending on the user's preference, and on the query, these features may or may not be weighted. Weighting means that it is possible to give more importance to certain features over others. 1.2 Introduction to Cosine Similarity The cosine similarity metric is an Information Retrieval (IR) technique which determines the similarity between documents by measuring the cosine of the angle between vectors of terms. Documents and queries can be repre- sented by vectors of terms, such that a system selects documents in response to a query, by identifying docu- ments whose vector representations are most similar to that of the query vector. THE USE OF DATA MINING IN THE DESIGN AND IMPLEMENTATION OF AN INCIDENT REPORT RETRIEVAL SYSTEM Doireann Cassidy, Joe Carthy, Anne Drummond, John Dunnion, and John Sheppard Department of Computer Science University College Dublin Dublin, Ireland