Experiments with Geographic Evidence Extracted from Documents Nuno Cardoso, Patrícia Sousa and Mário J. Silva Faculty of Sciences, University of Lisbon, LASIGE {ncardoso,csousa,mjs}@xldb.di.fc.ul.pt Abstract. For the 2008 participation at GeoCLEF, we focused on improv- ing the extraction of geographic signatures from documents and optimis- ing their use for GIR. The results show that the detection of explicit ge- ographic named entities for including their terms in a tuned weighted index field significantly improves retrieval performance when compared to classic text retrieval. 1 Introduction This paper presents the participation of the XLDB Team from the University of Lisbon at the 2008 GeoCLEF task. Following a thorough analysis of the results achieved on the 2007 participation [1], we identified the following improvement points: Experiment with new ways to handle thematic and geographic criteria. Our pre- vious methodology was moulded on the assumption that the thematic and geographic facets of documents and queries were complementary and non- redundant [2]. Previous GIR prototypes handled thematic and geographic sub- spaces in separate pipelines. As the evaluation results did not show significative improvements compared to classic IR, an alternative GIR methodology should be tested. Capture more geographic evidence from documents. The text mining module, based on shallow pattern matching of placenames, often failed on the extraction of essential geographic evidence for geo-referencing many relevant documents [1]. We therefore considered reformulating our text annotation tools, making them capable of capturing more geographic evidence from the documents. As people describe sought places in several other ways other than providing explicit place- names (e.g., “Big Apple”, “Kremlin” or “UE Headquarters”), these named entities (NE) can be captured and grounded to their locations, having an important role on defining the geographic area of interest of documents. Smooth query expansion. Query expansion (QE) is known to improve IR perfor- mance in most queries, but often at the cost of degrading the performance of other queries. We do not assign weights to query terms, so the expanded terms have the same weight as the initial query terms. This means that we do not control the im- pact of QE in some topics, which causes query drifting [3]. This year, we wanted to use QE with automatic re-weighting of text and geographic terms, to soften the undesired effect of query drifting.