QUESTOR – automatic searching for reports
Andrei Vasilateanu, Nicolae Goga, Tudor Sutu,
Marius Nastasescu
Faculty of Engineering in Foreign Languages
University POLITEHNICA of Bucharest
Romania
andrei.vasilateanu@upb.ro, n.goga@rug.nl,
tudor.sutu@gmail.com, marius.nastasescu@yahoo.com
Alin Moldoveanu, Victor Asavei
Faculty of Automatic Control and Computer Science
University POLITEHNICA of Bucharest
Romania
{alin.moldoveanu, victor.asavei}@cs.pub.ro
Cristian Taslitchi
Info World
Romania
cristian.taslitchi@infoworld.ro
Abstract— Vast arrays of reporting are nowadays in use,
allowing ever increasing flexibility and control over the displayed
results. However, as IT developed and became available to all
business domains, the number of reports used by companies
started to challenge the possibility of good management. The
European funded project Questor aims to create a revolutionary
product that will eliminate the complexities inherent in the report
management workflow. The motivation of this project is to
substitute the time consuming manual search for information in
reports with an automatic one. The final purpose is to make
querying the report database as simple as addressing a question
in natural language. This paper gives an overview over the
concept of the Questor project, its software architecture and its
preliminary results.
Keywords—information retrieval; reporting tools; business
intelligence
I. INTRODUCTION
QUESTOR is an Eureka project having a consortium with
members from industry and academia from Italy and Romania.
Questor aims to create an innovative product that will
eliminate the complexities inherent in the report management
workflow. One of the main motivations of this project and
creating a Questor tool is to eliminate the actual manual search
for information in reports with an automatic one. [1] The final
purpose is to make querying the report database as simple as
addressing a question in natural language such as “What is the
situation of the salaries in the IT department for April 2010?”
After the user introduces a question in natural language,
guided by intelligent contextual suggestions, using search
engine optimization techniques [2], a conversion engine
transforms the question in an intermediary semantic query
language which is used to query a metadata semantic
repository containing information about all the available
reports in the company. The relevant reports are executed,
with the parameter values detected from the question and
completed, keeping the human intervention to a minimum.
In case the question is ambiguous or multiple answers are
returned, the application displays all the results, ordered by
different ranking algorithms such as Page Rank [3] or
Levenshtein Distance, allowing the user to choose the most
relevant result. The rank of the chosen result is increased to
favor that result in subsequent queries. This paper presents the
main ingredients of Questor prototyping tool and also
experimental data.
The paper is organized as follows: the next section
presents Questor architecture, ontology and initial
implementation; then experimental data with Questor are
described and the conclusions are outlined at the end.
II. STATE-OF-THE-ART
Enterprise search solutions are an essential component of
business intelligence product frameworks so we can
enumerate some of them.
Microsoft FAST Search Server 2010 for SharePoint is
marketed as “Microsoft's best general productivity search
experience and a platform for building search-driven
applications. “ [4] The product suite uses custom connectors to
crawl Microsoft specific data sources such as Excel and RDL
files and offers a unified query graphical interface in which
the results can be filtered by their freshness, context, social
behavior and others. It is not applied in scenarios needing
generic, diverse report solutions.
For non-structured documents such as PDFs, HTML,
Microsoft Word and OpenDocument files the open source
solution from Apache, Lucene, is highly used. [5] Using
proven techniques such as inverse frequency indexing and
distributed processing by map-reduce functions, Lucene
allows indexing of any data that can be converted into textual
format. However, because it focuses on non-structured data, it
does not take into consideration the implicit structure existing
in a report and does not index the explicit relations among
terms.
2013 19th International Conference on Control Systems and Computer Science
978-0-7695-4980-4/13 $26.00 © 2013 IEEE
DOI 10.1109/CSCS.2013.60
263
2013 19th International Conference on Control Systems and Computer Science
978-0-7695-4980-4/13 $26.00 © 2013 IEEE
DOI 10.1109/CSCS.2013.60
263