Information Systems 31 (2006) 569–572 Editorial Advances in information retrieval: An introduction to the special issue The papers included in this Special Issue are extended and revised versions of the ones published in the Proceedings of the 11th Symposium on String Processing and Information Retrieval held in Padua, Italy in October of 2004, and edited by Alberto Apostolico and Massimo Melucci. Ricardo Baeza-Yates joined the issue representing the editorial committee of Information Systems. The detailed content follows. Information retrieval (IR) is concerned with the design, implementation and evaluation of systems which index and retrieve documents. The docu- ments managed by an IR system can be texts, images, video, sound or any combination thereof. By means of queries, the end users interact with an IR system to ﬁnd relevant documents. Relevant documents are those storing information relevant to the end users’ information needs. Indexing is the process which transforms an unstructured set of tokens, such as words or terms, to a data structure, called index. An index is a representation of the content of a collection of documents and of each document. It is by means of indexing that tokens that are found in documents individually or in groups become key words, index terms, or in general, descriptors, thereby assuming the representational power that is needed in order to identify potentially relevant documents. Given an input query, an IR system accesses the indexes and selects the documents probably relevant to the end user’s information need represented by that query. Due to the complexity of the process of retrieval, a system governs retrieval through a model, which in turn provides the users with a paradigm or some other useful abstraction of the system. The models proposed over the years abound, however, three of them have prevailed in the literature. The Boolean Model describes index terms as a set of documents and retrieval as a set of operations. The Vector Space Model describes index terms, documents and queries as vectors, and retrieval as the computation of distances among vectors. The Probabilistic Model describes docu- ments and index terms as random variables, and retrieval as the computation of the likelihood of the hypothesis that a document is relevant to an information need against the hypothesis that the document is not relevant. Additional paradigms have been proposed as alternatives to the classical models. With Inference Networks, information needs, queries, documents and index terms are described as random variables connected by arcs that represent beliefs. Retrieval is described as the computation of the degree of belief that an information need is supported by the documents. Another effective approach, called Language Modeling has been recently proposed and derives inspiration from speech recognition: here documents are described as generators of queries, and the probability of relevance is esti- mated by the probability that a document has generated a given query. The central problem of IR is the retrieval of all the relevant documents in connection with any conceivable information need without retrieving any non-relevant document. No system is capable of solving this problem. The reason is that indexes and queries are a mere approximation of the semantic content of documents and information needs. This makes the retrieval outcome imprecise, and explains why models are basically hinged on theories which deal with distances or probabilities. Correspond- ingly, measures are needed to assess how efﬁciently a particular retrieval system works and to compare ARTICLE IN PRESS www.elsevier.com/locate/infosys 0306-4379/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2005.11.005