Ontology learning from Italian legal texts Alessandro LENCI a , Simonetta MONTEMAGNI b , Vito PIRRELLI b and Giulia VENTURI b a Dipartimento di Linguistica – Università di Pisa (Italy) b Istituto di Linguistica Computazionale (CNR, Pisa, Italy) Abstract. The paper reports on the methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts. We use a fully–implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incre- mental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, show the great potential of NLP–powered incremental systems like T2K for accurate large–scale semi–automatic extraction of legal ontologies. Keywords. ontology learning, document management, knowledge extraction from texts, Natural Language Processing Introduction Ontology building is nowadays a very active research ﬁeld, as witnessed by the fast growing literature on the topic and the increasing number of Knowledge Management applications based on automated routines for ontology navigation and update. This task, however, requires harvesting domain–speciﬁc knowledge on an unprecedented scale, by tapping and harmonizing knowledge sources of highly heterogeneous conception, format and coverage, ranging from foundational ontologies and structured databases to elec- tronic text documents. As electronic texts still represent the most accessible and natural repositories of specialised information worldwide, there is a reasonable expectation that the increasingly growing demand for ontologically–interpreted knowledge can eventu- ally be met by making automatically–interpreted text information more and more avail- able. Different methodologies have been proposed to automatically extract information from texts and provide a structured organisation of extracted knowledge in as diverse do- mains/sectors as bio–informatics, health–care, public administration and company doc- ument bases. The situation in the legal domain is in line with this general trend. The work illustrated in this paper reports the results of a case study carried out in the legal domain to automatically induce ontological knowledge from texts with an ontol- ogy learning system, hereafter referred to as T2K (Text–to–Knowledge), jointly designed and developed by the Institute of Computational Linguistics (CNR) and the Department of Linguistics of the University of Pisa. The system offers a battery of tools for Natu- ral Language Processing (NLP), statistical text analysis and machine language learning,