QUESTOR – automatic searching for reports Andrei Vasilateanu, Nicolae Goga, Tudor Sutu, Marius Nastasescu Faculty of Engineering in Foreign Languages University POLITEHNICA of Bucharest Romania andrei.vasilateanu@upb.ro, n.goga@rug.nl, tudor.sutu@gmail.com, marius.nastasescu@yahoo.com Alin Moldoveanu, Victor Asavei Faculty of Automatic Control and Computer Science University POLITEHNICA of Bucharest Romania {alin.moldoveanu, victor.asavei}@cs.pub.ro Cristian Taslitchi Info World Romania cristian.taslitchi@infoworld.ro Abstract— Vast arrays of reporting are nowadays in use, allowing ever increasing flexibility and control over the displayed results. However, as IT developed and became available to all business domains, the number of reports used by companies started to challenge the possibility of good management. The European funded project Questor aims to create a revolutionary product that will eliminate the complexities inherent in the report management workflow. The motivation of this project is to substitute the time consuming manual search for information in reports with an automatic one. The final purpose is to make querying the report database as simple as addressing a question in natural language. This paper gives an overview over the concept of the Questor project, its software architecture and its preliminary results. Keywords—information retrieval; reporting tools; business intelligence I. INTRODUCTION QUESTOR is an Eureka project having a consortium with members from industry and academia from Italy and Romania. Questor aims to create an innovative product that will eliminate the complexities inherent in the report management workflow. One of the main motivations of this project and creating a Questor tool is to eliminate the actual manual search for information in reports with an automatic one. [1] The final purpose is to make querying the report database as simple as addressing a question in natural language such as “What is the situation of the salaries in the IT department for April 2010?” After the user introduces a question in natural language, guided by intelligent contextual suggestions, using search engine optimization techniques [2], a conversion engine transforms the question in an intermediary semantic query language which is used to query a metadata semantic repository containing information about all the available reports in the company. The relevant reports are executed, with the parameter values detected from the question and completed, keeping the human intervention to a minimum. In case the question is ambiguous or multiple answers are returned, the application displays all the results, ordered by different ranking algorithms such as Page Rank [3] or Levenshtein Distance, allowing the user to choose the most relevant result. The rank of the chosen result is increased to favor that result in subsequent queries. This paper presents the main ingredients of Questor prototyping tool and also experimental data. The paper is organized as follows: the next section presents Questor architecture, ontology and initial implementation; then experimental data with Questor are described and the conclusions are outlined at the end. II. STATE-OF-THE-ART Enterprise search solutions are an essential component of business intelligence product frameworks so we can enumerate some of them. Microsoft FAST Search Server 2010 for SharePoint is marketed as “Microsoft's best general productivity search experience and a platform for building search-driven applications. “ [4] The product suite uses custom connectors to crawl Microsoft specific data sources such as Excel and RDL files and offers a unified query graphical interface in which the results can be filtered by their freshness, context, social behavior and others. It is not applied in scenarios needing generic, diverse report solutions. For non-structured documents such as PDFs, HTML, Microsoft Word and OpenDocument files the open source solution from Apache, Lucene, is highly used. [5] Using proven techniques such as inverse frequency indexing and distributed processing by map-reduce functions, Lucene allows indexing of any data that can be converted into textual format. However, because it focuses on non-structured data, it does not take into consideration the implicit structure existing in a report and does not index the explicit relations among terms. 2013 19th International Conference on Control Systems and Computer Science 978-0-7695-4980-4/13 $26.00 © 2013 IEEE DOI 10.1109/CSCS.2013.60 263 2013 19th International Conference on Control Systems and Computer Science 978-0-7695-4980-4/13 $26.00 © 2013 IEEE DOI 10.1109/CSCS.2013.60 263