Research Paper Journal of Information Science 1–15 Ó The Author(s) 2018 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0165551518799637 journals.sagepub.com/home/jis A model-based method to improve the quality of ranking in keyword search systems using pseudo-relevance feedback Asieh Ghanbarpour School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran Hassan Naderi School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran Abstract Keyword search has been known as an attractive search over databases. One of the challenges in keyword search is to rank the answers (subgraphs) of a keyword query in order to their relevance to the query. Most of the previous studies in this area are highly heuristic ranking functions which were proposed based on a confined analysis of the characteristics of answers. These functions usu- ally reveal serious effectiveness problems due to their failure in recognising the conceptual differences between relatively similar answers. In this article, we propose a novel model-based method to improve the ranking accuracy of answers to a keyword query using pseudo-relevance feedback. The proposed method is built upon a carefully designed model for an answer called Structure- Aware Relevance Model which is estimated based on the textual and structural characteristics of the answer. In this article, we also study how to effectively select from feedback answers those words that are focused on the query topic based on placement and importance of words in the nodes of feedback answers. Extensive experiments conducted on a standard evaluation framework with three real-world datasets confirm the effectiveness of the proposed method. Keywords Keyword search; top-k query processing; relevance models; pseudo-relevance feedback 1. Introduction Keyword search is a user-friendly alternative to structured query languages (such as SQL, SPARQL and XQuery) to retrieve information needs from graph-structured databases. It is an attractive way of search for both common users who have no knowledge of the internal schema of the data and specialists who confuse in the large volume of data and their complex relationships. Keyword search has been known as an attractive alternative for structured query languages. A keyword query over a database modelled as graph G D is simply expressed with a set of keywords as Q = fq 1 , q 2 , ... , q Q j j g. An answer to query Q is a subgraph of G D that satisfies two conditions. First, for each keyword of the query, there is at least one node in the answer covering it. Second, the answer does not contain any proper sub- graph covering all the queried keywords. For example, suppose Figure 1(a) shows a part of a database graph. The nodes of the graph are associated with two types of entities: film and actor. Assume that a user poses a keyword query Q = John, Cazale, King, Margana f g on the graph to search the relationships among the query’s keywords. Let the key- words of the query are represented by a, b, c, d, respectively. Figure 1(b) shows an abbreviation of Figure 1(a) in which the textual relevance of words to their corresponding nodes is also specified. Some of the relevant answers to the query Corresponding author: Hassan Naderi, School of Computer Engineering, Iran University of Science and Technology, Tehran 16846-13114, Iran. Email: naderi@iust.ac.ir