Sentence Retrieval with LSI and Topic Identification David Parapar and ´ Alvaro Barreiro IR Lab, Department of Computer Science, University of A Coru˜ na, A Coru˜ na, Spain dparapar@udc.es, barreiro@udc.es Abstract. This paper presents two sentence retrieval methods. We adopt the task definition done in the TREC Novelty Track: sentence retrieval consists in the extraction of the relevant sentences for a query from a set of relevant documents for that query. We have compared the per- formance of the Latent Semantic Indexing (LSI) retrieval model against the performance of a topic identification method, also based on Singu- lar Value Decomposition (SVD) but with a different sentence selection method. We used the TREC Novelty Track collections from years 2002 and 2003 for the evaluation. The results of our experiments show that these techniques, particularly sentence retrieval based on topic identifi- cation, are valid alternative approaches to other more ad-hoc methods devised for this task. 1 Introduction and motivation In this work we understand the task of sentence retrieval in the way defined in the TREC Novelty Track. The Novelty Track was introduced for the first time in the TREC 2002 conference [1] and is composed of two main tasks. The first one is sentence retrieval: starting with a set of relevant documents for a query (topic in the TREC terminology), the system must extract from those documents the rel- evant sentences for that topic, removing the ones that do not contain significant information or that are related to different topics. The second task starts from the sentences retrieved in the first task or from the relevant sentences selected by human assessors. Taking in account this set, the system must retrieve only the novel sentences, i.e., sentences that contain new information with respect to the previous sentences in the set. In this paper we have focused only in the first task. Among the applications of sentence retrieval we find query-biased text sum- marization and the presentation to the users of the most relevant sentences of the documents retrieved in a results list [2]. Furthermore, the novelty task would re- move the redundant information in the extracted sentences. Another application could be the construction of question answering systems because query relevant sentences can be useful to obtain the user’s query. The research done for the Novelty Track can be divided in two groups. Some systems try to adapt classical document retrieval techniques to sentence re- trieval with a different definition of the parameters of interest. For example,