Parameterized Neural Network Language Models for Information Retrieval N. Despres * nicolas.despres@gmail.com S. Lamprier * sylvain.lamprier@lip6.fr B. Piwowarski * benjamin@bpiwowar.net Abstract Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but semantically related terms. Term dependencies refers to the need of considering the relationship between the words of the query when estimating the relevance of a document. A multitude of solutions has been proposed to solve each of these two problems, but no principled model solve both. In parallel, in the last few years, language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection. Although they present good abilities to cope with both term dependencies and vocabulary mismatch problems, thanks to the distributed representation of words they are based upon, such models could not be used readily in IR, where the estimation of one language model per document (or query) is required. This is both computationally unfeasible and prone to over-fitting. Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters, we explore use of new neural network models that are adapted to ad-hoc IR tasks. Within the language model IR framework, we propose and study the use of a generic language model as well as a document-specific language model. Both can be used as a smoothing component, but the latter is more adapted to the document at hand and has the potential of being used as a full document language model. We experiment with such models and analyze their results on TREC-1 to 8 datasets. 1 Introduction To improve search effectiveness, Information Retrieval (IR) have sought for a long time to properly take into account term dependencies and tackle term mismatch issues. Both problems have been tackled by various models, ranging from empirical to more principled approaches, but no principled approach for both problems have been proposed so far. This paper proposes an approach based on recent developments of neural network language models. Taking into account dependent query terms (as compound words for instance) in document relevance estimations usually increases the precision of the search process. This corresponds to developing approaches for identifying term dependencies and considering spatial proximity of such identified linked terms in the documents. Among the first proposals to cope with term dependency issues, Fagan et al. [7] proposed to consider pairs of successive terms (bi-grams) in vector space models. The same principle can be found in language models such as in [22] that performs mixtures of uni- and bi-gram models. Other works have sought to combine the scores of models by taking into account different co-occurrence patterns, such as [14] which proposes a Markov random field model to capture the term dependencies of queries and documents. In each case, the problem comes down to computing accurate estimates of n-gram language models (or variants thereof), i.e. language models where the probability distribution of a term depends on a finite sequence of previous terms. * Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606, 4 place Jussieu 75005 Paris 1 arXiv:1510.01562v1 [cs.IR] 6 Oct 2015