Smoothing Methods and Cross-Language Document Re-ranking Dong Zhou and Vincent Wade Centre for Next Generation Localisation Knowledge and Data Engineering Group Trinity College Dublin Dublin 2, Ireland dongzhou1979@hotmail.com , vincent.wade@cs.tcd.ie Abstract. This paper presents a report on our participation in the CLEF 2009 monolingual and bilingual ad hoc TEL@CLEF task involv- ing three different languages: English, French and German. Language modeling was adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particularly important when estimating a language model. The main purpose of the monolingual tasks is to compare different smoothing strategies and inves- tigate the effectiveness of each alternative. This retrieval model was then used alongside a document re-ranking method based on Latent Dirichlet Allocation (LDA) which exploits the implicit structure of the documents with respect to original queries for the monolingual and bilingual tasks. Experimental results demonstrated that three smoothing strategies be- have differently across testing languages while the LDA-based document re-ranking method should be considered further in order to bring signif- icant improvement over the baseline language modeling systems in the cross-language setting. 1 Introduction This year’s participation in the CLEF 2009 ad hoc monolingual and bilingual track was motivated by a desire to compare different smoothing strategies applied to language modeling for library data retrieval as well as to test and extend a newly developed document re-ranking method. Language modeling has been successfully applied to the problem of ad hoc retrieval [1,3]. It provides an attractive information model due to its theoretical foundations. The basic idea behind this approach is extremely simple - estimate a language model for each document and/or a query, and rank documents by the likelihood of the query (with respect to the document language model) or by the distance between the two models. The main object of smoothing is to adjust the maximum likelihood estimator of a language model so that it will be more accurate [3]. However, previous success over news collection data does not necessarily mean it will be efficient over the library data. Firstly the data is actually multilingual: C. Peters et al. (Eds.): CLEF 2009 Workshop, Part I, LNCS 6241, pp. 62–69, 2010. c Springer-Verlag Berlin Heidelberg 2010