Preservation and Transition of NCSTRL Using an OAI-Based Architecture H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair Old Dominion University Norfolk, Virginia USA {anan,liu_x,maly,mln,zubair} @cs.odu.edu J. C. French University of Virginia Charlottesville, Virginia USA french@cs.virginia.edu E. Fox, P. Shivakumar Virginia Tech Blacksburg, Virginia USA {fox,pshivaku}@vt.edu ABSTRACT NCSTRL (Networked Computer Science Technical Reference Library) is a federation of digital libraries providing computer science materials. The architecture of the original NCSTRL was based largely on the Dienst software. It was implemented and maintained by the digital library group at Cornell University until September 2001. At that time, we had an immediate goal of preserving the existing NCSTRL collection and a long-term goal of providing a framework where participating organizations could continue to disseminate technical publications. Moreover, we wanted the new NCSTRL to be based on OAI (Open Archives Initiative) principles that provide a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience in moving towards an OAI-based NCSTRL. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval. General Terms: Design, Management. Keywords: Digital Libraries, Open Archive Initiative (OAI). 1. INTRODUCTION NCSTRL (http://www.ncstrl.org), organized and supported at Cornell University, has been a successful digital library (DL) in operation from 1994-2001 with over 100 international participants and over 20,000 digital objects [1]. However, recent changes in the publication paradigm for scientific material and realignments of Cornell's DL research interests have caused Cornell to cease coordinating operations of NCSTRL. This fact, along with the widening acceptance of OAI [2], motivated us to look at an alternative architecture to preserve and sustain NCSTRL. Besides the immediate goal of preserving the old NCSTRL collection, we had a long-term goal to support existing NCSTRL collections by making them OAI compliant, possibly with new large collections at the department/organization level based on e-prints software (www.eprints.org), and individual publishers using Kepler software (http://kepler.cs.odu.edu; [3]) to create small OAI compliant repositories (Figure 1). 2. PRESERVING EXISTING COLLECTIONS We first extracted both the metadata and data from the existing Dienst servers and ftp sites. This process, including cleaning of metadata, was automated by writing scripts. Next we provided an OAI wrapper around the extracted metadata enabling it to be harvested by the new NCSTRL search service. The extracted documents and their metadata are currently being kept at Virginia Tech while the NCSTRL search/browse service is being hosted at Old Dominion University. / ( New NCSTRL Collections at Large Organization University OAI Eprints Software)Compliant Repository Old NCSTRL Collections with OAI Layer Manual Registration Service for NCSTRL OAI Compliant Automated LDAP Based Registration Service for Kepler Archivelets NCSTRL Search Service (Arc like Service) Individual Publisher OAI Compliant Repository (KeplerArchivelet) Figure 1. OAI based NCSTRL vision 2.1 Search Service We implemented the NCSTRL search service based on the architecture of the Java servlet-based Arc (http://arc.cs.odu.edu; [4]) with an Oracle database in the backend. The architecture is platform independent and can work with any web server. Moreover, minimal changes are required to work with different relational databases such as MySQL. The search service provides means to retrieve documents by their metadata. It supports both simple and advanced search as well as result sorting by archive or by discovery date. Simple search allows users to search free text across archive contents. Advanced search allows users to search in specific metadata fields. Users also can search/browse specific archives and/or archive partitions in case they are familiar with specific Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. JCDL’02, July 13-17, 2002, Portland, Oregon, USA. Copyright 2002 ACM 1-58113-513-0/02/0007…$5.00. 181