Developing a Question Answering System for the Slovene Language INES ČEH, MILAN OJSTERŠEK Laboratory for Heterogeneous Computer Systems Faculty of electrical engineering and computer science Smetanova 17, 2000 Maribor SLOVENIA ines.ceh@uni-mb.si, ojstersek@uni-mb.si Abstract: - In today’s world the majority of information is sought after on the internet. A common method is the use of search engines. However since the result of a query to the search engine is a ranked list of results, this is not the final step. It is up to the user to review the results and determine which of the results provides the information needed. Often this process is time consuming and does not provide the sought after information. Besides the number of returned results the limiting factor is often the lack of ability of the users to form the correct query. The solution for this can be found in the form of question answering systems, where the user proposes a question in the natural language, similarly as talking to another person. The answer is the exact answer instead of a list of possible results. This paper presents the design of a question answering system in natural slovene language. The system searches for the answers for our target domain (Faculty of Electrical Engineering and Computer Science) with the use of a local database, databases of the faculty’s information system, MS Excel files and through web service calls. We have developed two separate applications: one for users and the other for the administrators of the system. With the help of the latter application the administrators supervise the functioning and use of entire system. The former application is actually the system that answers the questions. Key-Words: question answering, Slovenian language, morphological dictionary of Slovenian language, Question Classification, machine learning, question templates, personalization 1 Introduction The basic idea of question answering systems is to be able to provide answers to questions written in natural language. The answers can be retrieved from different sources, e.g. web pages, plain texts, knowledge bases, web services etc. Unlike the information retrieval applications like web search engines that flood their users with documents or best-matching passages, the goal of question answering systems is to find a specific answer. There are many ways of looking at question answering, and they depend on the approaches towards various dimensions [1]. The aforementioned dimensions are: the question, the answer, the technique, the information source, the domain, and the evaluation. The dimensions and their mutual relationship as well as the connections are represented on Fig. 1. Currently the most extensive source of information, which is also used by question answering systems, is the World Wide Web. The World Wide Web was not designed for mere communication purposes but to contain information, therefore the idea that computers should also be capable of collaboration and automatic task management arose. Fig. 1: Dimensions of the system and connections between them However, the majority of information available on the web is suitable only for human use. Even the template based documents, e.g. the documents in different data bases, of which structure and meaning are defined, do not alleviate the work for programme agents [2]. There was a need for a change, and the idea of the Semantic Web was born [3], [4]. WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS ISSN: 1790-0832 1533 Issue 9, Volume 6, September 2009