Legal text processing within the MIREL project Milagro Teruel * , Cristian Cardellino * , Fernando Cardellino * , Laura Alonso Alemany * , Serena Villata† * Universidad Nacional de C´ ordoba INRIA & Universit´ eCˆ ote d’Azur CNRS Argentina France Abstract We present the roadmap and advances in the area of Information Extraction from legal texts within the EU-funded MIREL project (MIning and REasoning with Legal texts). We describe the resources and tools we have developed for Natural Language Processing in the legal domain, i.e., annotated corpora and automated classiﬁers for Named Entity Recognition and Linking and Argument Mining. Our ﬁnal objective is to identify arguments, their content and the relations between them in legal text, with a proof-of-concept in judgments of the European Court of Human Rights (ECHR), to ﬁnally sup- port reasoning tasks over mined argumentative structures. This representation will arguably be useful for applications like a reading aid, enhanced information retrieval, structured summarization, intelligent search engines or information extraction. All tools and resources are available at https://github.com/PLN-FaMAF/legal-ontology-population and https://github.com/PLN-FaMAF/ArgumentMiningECHR. Keywords: Argument Mining, Named Entity Recognition, Classiﬁcation and Linking, Legal Information Extraction 1. Introduction and Motivation Automated legal text processing is becoming more and more relevant within legal practice. According to the MIT Technology Review, the U.S. Consultancy group McKin- sey estimates that 22% of a lawyer’s job and 35% of a law clerk’s job can be automated (Winick, 2017), for example: “JPMorgan announced earlier this year that it is using software called Contract Intelligence, or COIN, which can in seconds perform document review tasks that took legal aides 360,000 hours.” “CaseMine, a legal technology company based in India, builds on document discovery software with what it calls its “virtual associate,” CaseIQ. The system takes an uploaded brief and suggests changes to make it more authoritative, while pro- viding additional documents that can strengthen a lawyer’s arguments.” (Winick, 2017) Natural Language Processing (NLP) tools have the capabil- ities scan huge amounts of legal documents, identify por- tions relevant to a given case and even present them in an orderly manner for a lawyer needs to craft a case, more quickly and more exhaustively than humans given the huge amount of data to process. In case law, if law practition- ers are provided with relevant cases when they are building their arguments for a new case, they could be more liable to produce a sounder argumentation. It is also to be ex- pected that cases are resolved more deﬁnitely if compelling jurisprudence is provided, even at an early stage in the ju- dicial process. More and more technological solutions are being developed in this line, which shows the feasibility and utility of this line of work. One of the objectives of the MIREL project 1 is to develop tools for MIning and REasoning with Legal texts, with the aim of translating these legal texts into formal repre- sentations that can be used for querying norms, compli- ance checking, and decision support. Open-source tools 1 http://mirelproject.eu/ and resources are also very important to provide equality in the access to the law. However, developing such tools is costly. Tools are usually trained with examples that have been manually analyzed and annotated by a domain expert, so we aim to reduce the cost of developing such tools by taking advantage of existing annotated resources. In this paper, we present our roadmap and advances to de- velop such tools, working in two main areas: Named Entity Recognition, Classiﬁcation and Linking (NERC and NEL) and Argument Mining (Lippi and Torroni, 2016). For each of these two areas, we both built annotated datasets follow- ing precise guidelines, and experimented supervised and unsupervised learning methods. More precisely, we have built a tool for NERC and NEL in the legal domain by exploiting the Wikipedia as an annotated corpus. To re- trieve the relevant portion of the Wikipedia, we have es- tablished a mapping between an ontology of the legal do- main, LKIF (Hoekstra et al., 2007), and an ontology cov- ering the Wikipedia knowledge, YAGO (Suchanek et al., 2007). We have also explored the use of different ﬂavors of word embeddings to transfer a Wikipedia-based model to judgments of the ECHR. We present extensive evalua- tion of the tools. For Argument Mining, we are manually annotating a corpus of judgments of the ECHR, with the focus on inter-annotator agreement and the performance of automatic analyzers to approach a balance between the de- scriptive adequacy and the performance of analyzers. In the following Section, we outline the roadmap of our proposal, and then we go on to describe the tools and re- sources we are developing for NERC and NEL (Section 3.) and for Argument Mining (Section 4.), comparing them with the existing approaches in these domains. Conclusions end the paper. 2. Objectives of Information Extraction within MIREL The ﬁnal goal within the Information Extraction area of MIREL is to obtain a representation of legal texts that shows their arguments and anchors them semantically. To do that, our main subgoals are: M. Teruel et al.: Legal text processing within the MIREL project 42 Proceedings of the LREC 2018 “Workshop on Language Resources and Technologies for the Legal Knowledge Graph”, Georg Rehm, Víctor Rodríguez-Doncel, Julián Moreno-Schneider (eds.), 12 May 2018, Miyazaki, Japan