Provenance in Web Applications
JANUARY/FEBRUARY 2011 1089-7801/11/$26.00 © 2011 IEEE Published by the IEEE Computer Society 31
T
he recent W3C Linking Open Data
initiative boosts the publica-
tion and interlinkage of massive
amounts of datasets on the Semantic
Web as Resource Description Frame-
work (RDF) data queried with the
SPARQL query language (see www.
linkeddata.org and www.w3.org/tr/rdf-
sparql-query). Together with other Web
2.0 technologies (such as mashups), this
initiative has essentially transformed
the Web from a publishing-only envi-
ronment into a vibrant place for infor-
mation dissemination in which data is
exchanged, integrated, and material-
ized in distributed repositories behind
SPARQL endpoints.
In this open environment, where
Semantic Web data is represented by
incomplete or replicated sets of RDF
triples, it’s crucial to be able to assert
the trustworthiness, reputation, and
reliability of published information.
This functionality essentially calls
for representing and reasoning with
the provenance of Semantic Web data
manipulated by SPARQL queries. For
instance, in the case of trust assess-
ment
1
(one of the key applications
recognized by the W3C Provenance
Incubator Group), query result trust-
worthiness is determined based on the
trustworthiness of the data sources
from which they’re derived. For sim-
ple Boolean trust assessment, we need
to determine only which output data
should be trusted. For ranked trust
assessment, we need to choose the
Capturing trustworthiness, reputation, and reliability of Semantic Web
data manipulated by SPARQL requires researchers to represent adequate
provenance information, usually modeled as source data annotations and
propagated to query results along with query evaluation. Alternatively, abstract
provenance models can capture the relationship between query results
and source data by taking into account the employed query operators. The
authors argue the beneits of the latter for settings in which query results are
materialized in several repositories and analyzed by multiple users. They also
investigate how relational provenance models can be leveraged for SPARQL
queries, and advocate for new provenance models.
Yannis Theoharis,
Irini Fundulaki,
Grigoris Karvounarakis,
and Vassilis Christophides
Forth-ICS
On Provenance of Queries
on Semantic Web Data