A Tri-Partite Neural Document Language Model for Semantic Information Retrieval Gia-Hung Nguyen 1(B ) , Lynda Tamine 1 , Laure Soulier 2 , and Nathalie Souf 1 1 Universit´ e de Toulouse, UPS-IRIT, 118 route de Narbonne, 31062 Toulouse, France gia-hung.nguyen@irit.fr 2 Sorbonne Universit´ e, CNRS - LIP6 UMR 7606, 75005 Paris, France Abstract. Previous work in information retrieval have shown that using evidence, such as concepts and relations, from external knowledge sources could enhance the retrieval performance. Recently, deep neural approaches have emerged as state-of-the art models for capturing word semantics. This paper presents a new tri-partite neural document lan- guage framework that leverages explicit knowledge to jointly constrain word, concept, and document learning representations to tackle a num- ber of issues including polysemy and granularity mismatch. We show the effectiveness of the framework in various IR tasks. Keywords: Semantic information retrieval · Knowledge source Deep learning 1 Introduction The semantic gap is a long-standing research topic in information retrieval (IR) that refers to the difference between the low-level description of document and/or query content (in general bags of words) and the high level of their meanings [30]. The semantic gap inherently hinders the query-document matching which is the crucial step for selecting candidate relevant documents in response to a user’s query. The semantic gap commonly originates from the following: (1) Vocabulary mismatch, also called lexical gap, which means that words with different shapes share the same accepted meaning (senses) (e.g., car is a synonym of motorcar ); (2) Granularity mismatch which means that words with different shapes and senses belong to the same general concept (e.g., air bag and wheel are both parts of a car ); (3) Polysemy which means that a word could cover different senses depending on its surrounding words in the text that represent its context (e.g., bass could mean a type of fish or the lowest part of harmony). To close these gaps, the prominent approaches employed in IR focus on the improvement of query and/or document representations using explicit knowledge provided by external knowledge sources or implicit knowledge inferred from text corpora. A first line of work is based on the use of linguistic sources (e.g., Word- Net) or knowledge graphs (e.g., DBpedia). The key idea of these approaches c Springer International Publishing AG, part of Springer Nature 2018 A. Gangemi et al. (Eds.): ESWC 2018, LNCS 10843, pp. 445–461, 2018. https://doi.org/10.1007/978-3-319-93417-4_29