An Intelligent System for Document
Retrieval in Distributed Office
Environments *
Uttam Mukhopadhyay,t Larry M. Stephens, Michael N. Huhns,:j: and
Ronald D. Bonnell
Center for Machine Intelligence, University of South Carolina,
Columbia, SC 29208
MINDS (Multiple Intelligent Node Document Servers) is a
distributed system of knowledge·based query engines
for efficiently retrieving multimedia documents in an of-
fice environment of distributed workstations. By learning
document distribution patterns, as well as user interests
and preferences during system usage, it customizes
document retrievals for each user. A two-layer learning
system has been implemented for MINDS. The knowl·
edge base used by the query engine is learned at the
lower level with the help of heuristics for assigning credit
and recommending adjustments; these heuristics are in-
crementally refined at the upper level.
1. Introduction
Documents are used in computerized office environ-
ments to store a variety of information. This information
is often difficult to utilize, especially in large offices with
distributed workstations, because users do not have per-
fect knowledge of the documents in the system or of the or-
ganization for their storage. The goal of the MINDS project
is to develop a distributed system of intelligent servers that
(1) learn dynamically about document storage patterns
throughout the system, and (2) learn interests and prefer-
ences of users so that searches are efficient and produce
relevant documents [1,2]. The strategy adopted for eval-
uating a set of learning heuristics that are applicable to
this goal is presented. In particular, this paper describes
the heuristic evaluation testbed, distance measures for
metaknowledge, document migration heuristics, evidence
assimilation techniques, and results of a system simulation.
*This research was supported in pati by NCR Corporation.
tNew address: Computer Science Department. General Motors Re-
search Laboratories. Warren, MI 48090.
~New address: Artificial Intelligence Department, Microelectronics
and Computer Technology Corporation. 9430 Research Boulevard,
Austin, TX 78759.
Received June 17. 1985; accepted August 30,1985.
© 1986 by John Wiley & Sons. Inc.
2. Distributed Workstation Environment
A. Organization of Documents
Queries regarding documents are frequently based on
the contents of the documents. Automatic text-under-
standing systems could conceivably process these queries
by reading the documents, but would be expensive to de-
velop and use. The names of documents provide clues to
their contents, but names are not descriptive enough for
reliable processing of content-based queries, However, a
set of keywords may be used to describe document con-
tents: the retrieval of documents can then be predicated
on these keywords as well as on other document attri-
butes, such as author, creation date, and location. Com-
plex qualifiers, which are conjunctions or disjunctions of
predicates on these attributes, may also be used. Each
document is thus represented by a surrogate containing its
attributes. The document and its surrogate are subse-
quently updated or deleted as dictated by system usage.
Surrogates occupy only a fraction of the storage space re-
quired by the documents, but usually contain enough in-
formation for users to determine whether a document is
useful.
The presumed office environment consists of a network
of single-user workstations. Each user may query the sys-
tem about his own locally-stored documents or about
those stored at other workstations. These documents are
not permanently located but may migrate to other work-
stations. Multiple copies of documents are allowed, but
documents stored at one location must have unique
names.
B. The User's Perspective
In typical distributed document management systems,
document directories are either centralized or distributed,
with or without redundancy [3]. However, the directory
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 37(3): 123-135, 1986 CCC 0002-8231/86/030123-13$04.00