PaMS: A component-based service for finding the missing full text of articles cataloged in a digital library Rodrygo L.T. Santos à , Alberto H.F. Laender, Marcos Andre ´ Gonc - alves, Allan J.C. Silva, Hugo S. Santos Department of Computer Science, Federal University of Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil article info Keywords: Digital libraries Meta-search Component-based software development abstract Providing access to the full text of cataloged articles is a highly desirable feature for a digital library. However, in many such systems, not all metadata records have (a direct pointer to) a corresponding full-text document. In this article, we present PaMS: a new service for finding the missing full text of articles cataloged in a digital library. This service is implemented as a software component in order to be readily deployable to existing systems. It works as a parameterized meta-search engine and allows digital library administrators to easily set up a search strategy, i.e., a list of existing search engines to be queried for the missing full text, as well as the filtering and ranking policies to be applied to the results retrieved by each search engine. We evaluate our service with respect to its effectiveness and efficiency with collections from two distinct fields: computer science and biomedical and life sciences. Our results attest the effectiveness of PaMS for finding missing full-text documents as well as other relevant material while keeping its overall execution time at a reasonable level. & 2009 Elsevier B.V. All rights reserved. 1. Introduction On-line access to the full text of cataloged items is an important requirement for satisfying the needs and ex- pectations of the users of a digital library (DL) of scientific articles [10]. However, in many of such DLs, mainly those built by aggregating metadata from heterogeneous sources, not all (metadata) records have a direct pointer (e.g., a URL) to the corresponding full text. As examples of DLs in the computer science field that suffer from this problem, we can cite the DBLP Computer Science Bibliography 1 and the Brazilian Digital Library of ComputingBDBComp. 2 Even the existence of a direct pointer to the full text may be useless to the user in some cases. For example, the content of interest may be accessible only by payment and the user may not want to complete the transaction. Also, the link to the full text may be brokenthe link was valid at cataloging time but, due to the dynamics of the Web, it became broken. An alternative for users who wish to obtain the full text of articles for which they already have some metadata is to employ these metadata to try to find the desired items on the Web with the aid of existing search engines (whether specialized or not) and to examine their returned results to check whether they correspond to the full texts wanted. In this article, we propose and evaluate a service, called PaperMetaSearch (PaMS), which automatizes this process thus diminishing the user’s effort while trying to improve the results retrieved by different search engines through customizable search strategies. We also describe the architecture of a software component that implements this service and that can be deployed and reused in several digital libraries. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/infosys Information Systems ARTICLE IN PRESS 0306-4379/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2009.04.004 à Corresponding author. Tel.: +55 313409 5860. E-mail addresses: rodrygo@dcc.ufmg.br (R.L.T. Santos), laender@dcc.ufmg.br (A.H.F. Laender), mgoncalv@dcc.ufmg.br (M.A. Gonc - alves), allan@dcc.ufmg.br (A.J.C. Silva), hugocomp@dcc.ufmg.br (H.S. Santos). 1 http://www.informatik.uni-trier.de/ ley/db/ 2 http://www.lbd.dcc.ufmg.br/bdbcomp/ Information Systems 35 (2010) 544–556