Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron Kno.e.sis Center Wright State University 3640 Colonel Glenn Highway Dayton, OH 45435 USA delroy@knoesis.org Pablo N. Mendes Kno.e.sis Center Wright State University 3640 Colonel Glenn Highway Dayton, OH 45435 USA pablo@knoesis.org Amit P. Sheth Kno.e.sis Center Wright State University 3640 Colonel Glenn Highway Dayton, OH 45435 USA amit@knoesis.org Victor Chan Air Force Research Lab Wright-Patterson AFB Dayton, OH 45433-5707 USA victor.chan@wpafb.af.mil ABSTRACT The interaction paradigm oﬀered by most contemporary Web Information Systems is a search-and-sift paradigm in which users manually seek information using hyperlinked docu- ments. This paradigm is derived from a document-centric model that gives users minimal support for scanning through high volumes of text. We present a novel information explo- ration paradigm based on a data-centric view of corpora, along with a prototype implementation that demonstrates the value in content-driven navigation. We leverage seman- tic metadata to link data in documents by exploiting named relationships between entities. We also present utilities for gathering user generated navigation trails, critical for knowl- edge discovery. We discuss the impact of our approach in the context of knowledge exploration. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval Models; I.2.4 [Knowledge Representation Formalisms and Methods]: Semantic Networks—Ontologies General Terms Design, Human Factors Keywords Navigation, Knowledge Exploration, Semantic Metadata, Se- mantic Browsing, Exploratory Search, Annotation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. ACMSE ’10 April 15-17, 2010, Oxford, MS, USA. Copyright c  2010 ACM 978-1-4503-0064-3/10/04 ...$10.00. 1. INTRODUCTION The prevailing paradigm for information retrieval and ex- ploration on the Web is based on keyword search and docu- ment browsing. Under such circumstances, the user is likely to undertake the following operations: • First, assemble a set of keywords deemed ideal for re- trieving “good hits.” • Select documents based on title links and document summaries for each hit appearing in the Search Engine Results Page (SERP). • Manual document inspection for relevance veriﬁcation based on overlap between document content and infor- mation need. • Finally, optional result aggregation and organization, commonly through bookmarking, saving, publishing etc. This interaction sequence suﬀers various limitations. First, since query reformulation is the only recourse if no rele- vant results are found, multiple queries may need to be con- structed before satisfactory results can be obtained. Second, the ability to navigate to surrounding and related contexts becomes restricted to pre-established anchors provided by page creators. For example, if an exploratory-minded user begins with the search phrase “Father of the Web,” he may be unable to examine a related context such as the “World Wide Web Consortium,” (the organization chaired by the said father), unless hyperlinks exist apriori from documents in the SERP to documents in the corpus containing the term “World Wide Web Consortium.” This dependency between information reachability and hyperlinks could further be- come problematic in text collections devoid of hyperlinks altogether, such as Medline 1 . Another limitation that renders this paradigm somewhat impractical occurs when a user’s information need is not well deﬁned to begin with. For example, a user interested in 1 Medline comprises more than 19 million citations for biomedical articles with links to full-text articles