Implementing a Search Engine using an OODB Andrea Garratt, Mike Jackson, Peter Burden, Jon Wallis School of Computing & IT, University of Wolverhampton, 35-49 Lichfield Street, Wolverhampton WV1 1EL, UK {in5969, m.s.jackson, jphb, j.wallis}@wlv.ac.uk Abstract This paper describes the experiences gained from using a commercially available object-oriented database to provide persistent capabilities to an experimental Web search engine called WWLib-TNG. The paper describes the design and evaluation of the builder and searcher components of the search engine, which are implemented in Java. The builder constructs a catalogue of World Wide Web pages and the searcher allows the catalogue to be searched. The builder was evaluated for time taken to construct a catalogue, efficient use of disk space and scalability. The searcher was evaluated for response time and scalability. The results from the experiments conducted showed that, for this application, the object-oriented database had scalability problems and was expensive in terms of disk space. 1 Introduction The Wolverhampton Web Library - The Next Generation (WWLib-TNG) is an experimental World Wide Web (Web) search engine currently under development at the School of Computing and Information Technology, University of Wolverhampton (Burden and Jackson, 1999) and (Wallis and Burden, 1995). The aim of WWLib- TNG is to combine the advantages of classified directories with the advantages of automation by providing automatic classification (Jenkins et al, 1998a) and (Jenkins et al, 1998b) and automatic resource discovery. Classified directories have many advantages over search engines. Classification enables the retrieval of more Web pages that are relevant to a user's query. Unfortunately, in comparison to most search engines classified directories have a smaller corpus that contains out-of-date information and many dead-links. Search engines use a spider (or robot) to gather Web pages automatically and, as a result, they generally have a large corpus. WWLib-TNG gathers and classifies Web pages from sites in the UK. The Dewey Decimal Classification (DDC) system is used to classify Web pages because it has been used by UK libraries for many years and is therefore a well understood system. Figure 1.1 shows the architecture of the search engine.