Indexing and Access for Digital Libraries and the
Internet: Human, Database, and Domain Factors
Marcia J. Bates
Department of Information Studies, University of California, Los Angeles, Los Angeles, California 90095-1520
E-mail: mjbates@ucla.edu
Discussion in the research community and among the
general public regarding content indexing (especially
subject indexing) and access to digital resources, espe-
cially on the Internet, has underutilized research on a
variety of factors that are important in the design of such
access mechanisms. Some of these factors and issues
are reviewed and implications drawn for information
system design in the era of electronic access. Specifi-
cally the following are discussed: Human factors: Sub-
ject searching vs. indexing, multiple terms of access,
folk classification, basic-level terms, and folk access;
Database factors: Bradford’s Law, vocabulary scalabil-
ity, the Resnikoff-Dolby 30:1 Rule; Domain factors: Role
of domain in indexing.
Introduction
Objectives
In the current era of digital resources and the Internet,
system design for information retrieval has burst onto the
stage of the public consciousness in a way never seen
before. Almost overnight, people who had never thought
about information retrieval are now musing on how to get
the best results from their query on a World Wide Web
search engine, or from a remote library catalog or digital
library resource. At the same time, and under the same
stimulus, experts in a variety of fields cognate to informa-
tion science—such as cognitive science, computational
liguistics, and artificial intelligence—are taking up informa-
tion retrieval questions from their perspectives.
In the meantime, those of us in information science,
where information retrieval from recorded sources (as dis-
tinct from mental information retrieval) has long been a
core, if not the core, concern, are faced with a unique mix
of challenges. Information science has long been a field
understaffed with researchers. Where some fields have 10
researchers working on any given question, we have often
had one researcher for 10 questions. A promising research
result from an information science study may languish for
years before a second person takes up the question and
builds on the earlier work. (This is not universal in the field;
some questions are well studied.) As a consequence of this
understaffing, we know a lot about many elements of infor-
mation retrieval, but often in a fragmented and underdevel-
oped way.
At the same time, what we do know is of great value.
Years of experience, not only with research but also with
application in dozens of information-industry companies,
have given information scientists a deep understanding
about information retrieval that is missing in the larger
world of science and information technology.
So at this particular historical juncture in the devel-
opment of information retrieval research and practice, I
believe there are a number of both research results and
experience-based bits of knowledge that information sci-
entists need to be reminded of, and non-information
scientists need to be informed of—information scientists
because the fragmentation and understaffing in our field
has made it difficult to see and build on all the relevant
elements at any one time, and people outside of informa-
tion science because these results are unknown to them,
or at least unknown to them in the information retrieval
implications.
My purpose here is to draw attention to that learning and
those research results associated with indexing and access to
information, which have been relatively under-utilized by
those both inside and outside of information science. These
are results that seem to me to have great potential and/or
importance in information system design, and for further
research.
Making such a selection is, of course, a matter of judg-
ment. I believe that the material below offers the possibility
of enriching design and, when studied further, enriching our
Received January 31, 1997; revised August 28, 1997; accepted November
7, 1997.
© 1998 John Wiley & Sons, Inc.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 49(13):1185–1205, 1998 CCC 0002-8231/98/131185-21