Geo-Textual Relevance Ranking to Improve a Text-Based Retrieval for Geographic Queries Jos´ e M. Perea-Ortega, Miguel A. Garc´ ıa-Cumbreras, L. Alfonso Ure˜ na-L´ opez, and Manuel Garc´ ıa-Vega SINAI research group. Computer Science Department. University of Ja´ en. Spain {jmperea,laurena,mgarcia,magc}@ujaen.es Abstract. Geographic Information Retrieval is an active and growing research area that focuses on the retrieval of textual documents according to a geographic criteria of relevance. In this work, we propose a reranking function for these systems that combines the retrieval status value calcu- lated by the information retrieval engine and the geographical similarity between the document and the query. The obtained results show that the proposed ranking function always outperforms text-based baseline approaches. 1 Introduction In the field of Geographical Information Retrieval (GIR), a geographic query is structured as a triplet of <theme ><spatial relationship ><location > and it is concerned with improving the quality of geographically-specific information re- trieval with a focus on access to unstructured documents [2]. Thus, a search for “castles in Spain ” should return not only documents that contain the word “cas- tle ”, also those documents which have some geographical entity within Spain. One of the open research questions in GIR systems is how to best combine the textual and geographical similarities between the query and the relevant document [4]. For this reason, in this work we propose a reranking function based on these both similarities. Our experimental results show that the proposed ranking function can outperform text-based baseline approaches. 2 GIR System Overview GIR systems are usually composed of three main stages: preprocessing of the document collection and queries, textual-geographical indexing and searching and, finally, reranking of the retrieved results using a particular relevance formula that combines textual and geographical similarity between the query and the retrieved document. The GIR system presented in this work follows the same approach, as can be seen in Figure 1. The preprocessing carried out with the queries was mainly based on detecting their geographical scopes. This involves specifying the triplet <theme><spatial relationship><location>, which will be used later during the reranking process. R. Mu˜ noz et al. (Eds.): NLDB 2011, LNCS 6716, pp. 278–281, 2011. c Springer-Verlag Berlin Heidelberg 2011