355 Automation of Lexicographic Work Using General and Specialized Corpora: Two Case Studies Iztok Kosem 1 , Polona Gantar 2 , Nataša Logar 3 , Simon Krek 4 1 Trojina, Institute for Applied Slovene Studies 2 Fran Ramovš Institute for the Slovene Language, ZRC SAZU, Ljubljana, Slovenia 3 Faculty of Social Sciences, University of Ljubljana 4 Jožef Stefan Institute, Ljubljana iztok.kosem@trojina.si, apolonija.gantar@guest.arnes.si, natasa.logar@fdv.uni-lj.si, simon.krek@guest.arnes.si Abstract Due to increasingly large amounts of authentic data to analyse, lexicographers are nowadays looking to language technologies to provide them with not only the tools to analyse the data, but also with tools and methods that ease and speed up the data analysis. One of the most promising avenues of re- search has been the automation of early stages of the corpus data analysis, with the aim to summari- ze, and consequently reduce, the amount of corpus data that the lexicographers need to examine. Ho- wever, most of this research deals with general lexicography; terminology is yet to extensively test these methods. This paper attempts to address this gap by presenting two separate Slovene research projects, one lexicographic (Slovene Lexical Database) and the other terminological (Termis), that used the same method of automatic extraction of corpus data (presented in Kosem et al. 2013). After describing the projects and the corpora use, similarities and diferences in the parameter settings and the quality of extracted data in the two projects are presented. We conclude with discussing the further potential of automation in both general and specialised lexicography. Keywords: data extraction; terminology; general language; collocations; dictionary; GDEX 1 Introduction In recent years, lexicography has witnessed several projects where automation of diferent aspects of lexicographer’s work has been successfully implemented, such as detection of new words or mea- nings (Cook et al. 2013) or initial data extraction (Kosem et al. 2013). This trend of increasing the role of a computer in the dictionary-making process follows Rundell and Kilgarrif’s (2011) vision of focus- sing lexicographer’s tasks towards validating and completing the data extracted by a computer. The calls for automation originate mainly from general lexicography where lexicographers are faced with increasingly larger corpora that they need to analyze. But what about using automation in the making of dictionaries, such as terminological dictionaries, where much smaller and more speciali- zed corpora are used? To what extent can automation methods used in general lexicography be trans- 1 / 10 1 / 10 1 / 10 1 / 10 1 / 10 1 / 10