Visualization and Structure Analysis of Legislative Acts: A Case Study on the Law of Obligations Innar Liiv, Anton Vedeshin, Ermo Täks Department of Informatics Tallinn University of Technology 15 Raja Street, 12618 Tallinn, Estonia {innar.liiv,anton.vedeshin,ermo.taks}@ttu.ee ABSTRACT Enhancement of search and retrieval systems in the area of AI & Law area has been one of the key research areas. Keyword searching systems are generally less than satisfactory because they find documents by word matching, disregarding the meaning of information. After a short literature review, several relevance and importance measures are introduced and discussed, which are well-known in the area of social network analysis, but virtually unknown in the legal domain for measuring elements in the legislative acts (titles, chapters, sub-chapters and sections). Structure analysis methods for discovering hubs, chains and clusters in the legislative acts reference are presented with numerical examples. This paper argues that it is possible to get reasonable and interpretable results when analyzing only the inner-structure of the legislative act constructed by references in them. Experimental results for the law of obligations are included and analyzed. Categories and Subject Descriptors H. [Information Technology and Systems] H.2.8.c Data and knowledge visualization, H.2.8.d Data mining, H.2.8.i Mining methods and algorithms, I.2.1 [Artificial Intelligence]: Applications and Expert Systems – law. General Terms Algorithms, Experimentation, Measurement Keywords automated extraction of information from legal texts, knowledge discovery in legal databases, text mining. 1. INTRODUCTION Enhancement of search and retrieval systems in the area of AI & Law area has been one of the key research areas. Already [12] recognized that keyword retrieval systems are generally less than satisfactory because they select documents by word matching, disregarding the meaning of information. To narrow the gap, research in the legal domain has mainly concentrated in the concept extraction and automatic document classification. Concept extraction involves the identification of concept-referring terms and phrases from the text and - if possible - their generalization into more abstract concepts [2]. Several methods have been applied for automatic extraction of concepts, document classification and relevance prediction in legal domain, which can be classified as: • supervised learning algorithms - classical three-layer back propagation neural networks [13], discriminant analysis with similarity measures (Jaccard coefficient, cosine coefficient and Dice coefficient) [26], inductive algorithms [18], [3], decision tree learning algorithms - ID3 [19], C4.5 [24], support vector machines (SVM) [30], Bayes (NB) and maximum entropy (ME) [29]; • unsupervised learning algorithms - Self-organizing maps (SOM) [17], [20], [25] and clustering [28], [32]. Learning algorithms extract rules, rather than weighting vectors over the entire vocabulary. In cases where a relevant legal taxonomy is available, together with representative labeled data, automated text classification tools can be applied. In the absence of these resources, document clustering offers an alternative approach to organizing collections, and an adjunct to search [32]. Several enhancements and workarounds include using sentences instead of entire documents as examples [19] and abstracting from names to roles - substituting names by generic roles imports some of the overall case context into the examples, thereby adding information content to them [23]. Such workarounds might gain effects in very specific sub-domain and need to be re-implemented in every new situation. Working with different languages, especially cross-lingual legal information retrieval, adds completely another dimension of complexity - language problems have been addressed in [28], [31]. Those issues are relevant to current paper as well, legislative acts parsed and analyzed in this paper are in Estonian, which is an agglutinative language. The main contributions of this paper can be summarized as follows. We will introduce and discuss several relevance and importance measures (Section 2) well-known in the area of social network analysis ([4],[5]), but virtually unknown in the legal domain for measuring elements in the legislative acts (titles, chapters, sub-chapters and sections). We present a structure analysis methods (Section 3) for analyzing legislative acts for discovering hubs, chains and clusters in the legislative acts reference. This paper argues that it is possible to get reasonable Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICAIL 2007, June 4–6, 2007, Palo Alto, California, USA. Copyright 2007 ACM 1-58113-000-0/00/0004…$5.00.