Knowl Inf Syst
DOI 10.1007/s10115-014-0736-0
REGULAR PAPER
Finding top- kr -cliques for keyword search from graphs
in polynomial delay
Mehdi Kargar · Aijun An
Received: 11 April 2013 / Revised: 13 December 2013 / Accepted: 17 January 2014
© Springer-Verlag London 2014
Abstract Keyword search over structured data offers an alternative method to explore and
query databases for users that are not familiar with the structure of the data and/or a query
language. Structured data are usually modeled as graphs. In this context, an answer is a
substructure of the graph that contains all or some of the query keywords. In most of the
previous works, a minimal tree that covers all the query keywords are found as the answer.
Some recent works offer to find subgraphs rather than minimal trees and show that subgraphs
might be more informative for the users. However, current methods suffer from the following
problems. Although some of the content nodes (i.e., nodes that contain input keywords) are
close to each other in an answer, others might be far from each other. While searching for
the best answer, current methods explore the whole graph rather than only the content nodes.
This might increase the run time and leads to poor performance. To address these problems,
we propose to find top-kr -cliques as the answers to the graph keyword search problem.
An r -clique is a set of content nodes that cover all the input keywords, and the distance
between each pair of nodes is less than or equal to r . We propose a new weight function
that is the sum of distances between each pair of content nodes. We prove that minimizing
the new weight function is NP-hard and propose an approximation algorithm that produces
r -cliques with 2-approximation ratio in polynomial delay. We further improve the run time of
the approximation algorithm with the cost of increasing the approximation ratio. Extensive
performance studies using three large real datasets confirm the efficiency and accuracy of
finding r -cliques in graphs.
Keywords Keyword search · Graph data · Polynomial delay · Approximation algorithm
M. Kargar (B ) · A. An
Department of Computer Science and Engineering,
York University, Toronto, Canada
e-mail: kargar@cse.yorku.ca
A. An
e-mail: aan@cse.yorku.ca
123