GENERATING AND BROWSING MULTIPLE TAXONOMIES 191
Journal of Management Information Systems / Spring 2003, Vol. 19, No. 4, pp. 191–212.
© 2003 M.E. Sharpe, Inc.
0742–1222 / 2003 $9.50 + 0.00.
Generating and Browsing Multiple
Taxonomies Over a Document
Collection
SCOTT SPANGLER, JEFFREY T. KREULEN,AND
JUSTIN LESSLER
SCOTT SPANGLER has been doing knowledge base and data mining research for the
past 15 years—recently at the IBM Almaden Research Center and previously at the
General Motors Technical Center. Since coming to IBM in 1996, he has developed
software components for data visualizationand text mining, which are available through
the Lotus Discovery Server product and IBM Alphaworks. Mr. Spangler has pub-
lished papers at ACM-SIGKDD, Machine Learning, IAAI, ACM Hypertext, and
HICSS. He holds five patents and has several more patents pending. Scott Spangler
holds a B.S. in Math from MIT and an M.A. in Computer Science from the University
of Texas.
JEFFREY T. KREULEN is a manager at the IBM Almaden Research Center. He holds a
B.S. in applied mathematics (computer science) from Carnegie Mellon University,
and an M.S. in electrical engineering and a Ph.D. in computer engineering from the
Pennsylvania State University. Since joining IBM in 1992, he has worked on multi-
processor systems design and verification, operating systems, systems management,
Web-based service delivery, and integrated text and data analysis.
JUSTIN LESSLER started his career with IBM after graduating the University of North
Carolina at Chapel Hill in 1996 and has been with the IBM Almaden Research Center
since 1999.
ABSTRACT: We present a novel system and methodology for generating and then
browsing multiple taxonomies over a document collection. Taxonomies are gener-
ated using a broad set of capabilities, including meta data, key word queries, and
automated clustering techniques that serve as a seed taxonomy.The taxonomy editor,
eClassifier, provides powerful tools to visualize and edit each taxonomy to make it
reflective of the desired theme. Cluster validation tools allow the editor to verify that
documents received in the future can be automatically classified into each taxonomy
with sufficiently high accuracy.
In general, those seeking knowledge from a document collection may have only a
vague notion of exactly what they are attempting to understand, and would like to
explore related topics and concepts rather than simply being given a set of docu-
ments. For this purpose, we have developedMindMap,an interface utilizingmultiple
taxonomies and the ability to interact with a document collection.
KEY WORDS AND PHRASES: data mining, document classification, document clustering
techniques,knowledge management, navigation,taxonomy, text mining, visualization.