AUTHOR COPY
Intelligenza Artificiale 6 (2012) 121–133
DOI 10.3233/IA-120034
IOS Press
121
Natural language interaction with the web
of data by mining its textual side
Elena Cabrio
a,*
, Julien Cojan
a
, Alessio Palmero Aprosio
b,c
and Fabien Gandon
a
a
INRIA Sophia Antipolis, Sophia Antipolis, France
b
FBK, Povo-Trento, Italy
c
Universit` a degli Studi di Milano, Milano, Italy
Abstract. The Semantic Web is an extension of the classical web. The data and schemas it adds coexist with the documents that
were already linked and available. This not only allows interoperability, reusability and potentially unforeseen applications of
opened data, but it also creates a unique situation of availability on the web of huge collections of the same pieces of information
available at the same time as text and as structured data. An interesting example is the couple Wikipedia-DBpedia: exploiting these
interlinked structured and unstructured data sources in parallel can offer a great potential for both Natural Language Processing
and Semantic Web applications. Starting from these observations, this paper addresses the problem of enhancing interactions
between non-expert users and data available on the Web. In particular, we present QAKiS, a system for open domain Question
Answering over linked data, that addresses the problem of question interpretation as a relation-based match, where fragments of
the question are matched to binary relations of the triple store, using relational textual patterns automatically collected. In the
current version, the relational patterns are automatically extracted from Wikipedia, while DBpedia is the data set to be queried
using a natural language interface.
Keywords: Question answering, linked data, Wikipedia, DBpedia
1. Web of documents and web of data: Jointly
exploiting two facets of the web
In the early days of the web this child of hypertext
systems and network applications was often presented
using the metaphor of a huge online library of inter-
linked documents. This documentary facet of the web
is the most ancient and most persistent perception of
it, and sometimes still hides the true nature of the
web, which is in fact a huge network of computational
resources, algorithms and data. Among the side effects
of this biased perception of the web, for a long time
*
Corresponding author. Elena Cabrio, INRIA Sophia Antipolis,
2004 Route des Lucioles BP93, 06902 Sophia Antipolis, France.
E-mails: elena.cabrio@inria.fr (Elena Cabrio); julien.cojan@inria.fr
(Julien Cojan); fabien.gandon@inria.fr (Fabien Gandon); aprosio@
fbk.eu (Alessio Palmero Aprosio).
Natural Language Processing (NLP) community only
saw the web as a huge corpus (e.g. [20, 19]), where data
are stored in standard HTML documents, and presented
to the users in human readable representations rendered
through the web technical stack and architecture. But in
fact one could argue that there are no documents on the
web and there never were. We only have ephemeral rep-
resentations transferred through networks and produced
on demand by distributed programs using distributed
data.
What is now called the web of data can been seen as
the first wave of the deployment of the Semantic Web
(SW), and makes very visible the fact that the web is
not only publishing linked documents, but also linked
data and linked applications. The standardization of the
frameworks of the semantic web aims at transforming
the access to information by adding machine-readable
1724-8035/12/$27.50 © 2012 – IOS Press and the authors. All rights reserved