Sentence Retrieval with LSI and Topic Identiﬁcation David Parapar and ´ Alvaro Barreiro IR Lab, Department of Computer Science, University of A Coru˜ na, A Coru˜ na, Spain dparapar@udc.es, barreiro@udc.es Abstract. This paper presents two sentence retrieval methods. We adopt the task deﬁnition done in the TREC Novelty Track: sentence retrieval consists in the extraction of the relevant sentences for a query from a set of relevant documents for that query. We have compared the per- formance of the Latent Semantic Indexing (LSI) retrieval model against the performance of a topic identiﬁcation method, also based on Singu- lar Value Decomposition (SVD) but with a diﬀerent sentence selection method. We used the TREC Novelty Track collections from years 2002 and 2003 for the evaluation. The results of our experiments show that these techniques, particularly sentence retrieval based on topic identiﬁ- cation, are valid alternative approaches to other more ad-hoc methods devised for this task. 1 Introduction and motivation In this work we understand the task of sentence retrieval in the way deﬁned in the TREC Novelty Track. The Novelty Track was introduced for the ﬁrst time in the TREC 2002 conference [1] and is composed of two main tasks. The ﬁrst one is sentence retrieval: starting with a set of relevant documents for a query (topic in the TREC terminology), the system must extract from those documents the rel- evant sentences for that topic, removing the ones that do not contain signiﬁcant information or that are related to diﬀerent topics. The second task starts from the sentences retrieved in the ﬁrst task or from the relevant sentences selected by human assessors. Taking in account this set, the system must retrieve only the novel sentences, i.e., sentences that contain new information with respect to the previous sentences in the set. In this paper we have focused only in the ﬁrst task. Among the applications of sentence retrieval we ﬁnd query-biased text sum- marization and the presentation to the users of the most relevant sentences of the documents retrieved in a results list [2]. Furthermore, the novelty task would re- move the redundant information in the extracted sentences. Another application could be the construction of question answering systems because query relevant sentences can be useful to obtain the user’s query. The research done for the Novelty Track can be divided in two groups. Some systems try to adapt classical document retrieval techniques to sentence re- trieval with a diﬀerent deﬁnition of the parameters of interest. For example,