An Adaptive Ontology based Hierarchical Browsing System for CiteSeer x Nanhong Ye CSCE Department University of Arkansas Fayetteville, USA Email: nye@uark.edu Susan Gauch CSCE Department University of Arkansas Fayetteville, USA Email: sgauch@uark.edu Qiang Wang CSCE Department University of Arkansas Fayetteville, USA Email: qxw002@uark.edu Hiep Luong CSCE Department University of Arkansas Fayetteville, USA Email: hluong@uark.edu Abstract—As an indispensable technique in addition to the field of Information Retrieval, Ontology based Retrieval System (or Browsing Hierarchy) has been well studied and developed both in academia and industry. However, most of current systems suffer the following problems: (1) Constructing the mappings between documents and concepts in ontology requires the training of robust hierarchical classifiers; it’s difficult to build such classifiers for large-scale documents corpus due to the time-efficiency and precision issues. (2) The traditional Browsing Hierarchical System ignores the distribution of documents over concepts, which is not realistic when a large number of documents distributed biasly on certain concepts. Browsing documents such concepts becomes time- consuming and unpractical for users. Therefore, further split- ting these concepts into sub-categories is necessary and critical for organizing documents in the browsing system. Aiming at building the Hierarchical Browsing System more realistically and accurately, we propose an adpative Hierarchical Browsing System framework in this paper, which is designed to build a Browsing Hierarchy for CiteSeer x . In this framework, we first investigate the supervised learning approaches to classify documents into existing predefined concepts of ontology and compare their performance on different datasets of CiteSeer x . Then, we give a empirical analysis of unsupervised learning methods for adding new clusters to the existing browsing hierarchy. Experimental analysis on CiteSeer x corpus shows the effectiveness and the efficiency of our method. Keywords-Ontology; Browsing System; Unsupervised Learn- ing; I. I NTRODUCTION As the exponential growth of information generated on the World Wide Web, the Information Retrieval techniques like Browsing System have become more and more important . Different from Ad-Hoc Information Retrieval, searching information by browsing provides another perspective of information retrieval technique. Typically, a browsing sys- tem is often associated with an ontogloy —a hierarchi- cal structure of concepts — that represents a domain of knowledge system. In practice, building a ontology for intelligent system involves domain-specific experts’ effort to manually identify a set of representational primitives and integrate them iteratively into ontology system. For instance, several applications of ontology based search and browse system [1,4] are created in this manner, and constructed and maintained by vast community of volunteer editors. Growing and maintaining ontology is a challenging problem for the reason that formal ontology engineers must keep them- selves updated with extensive domain-specific knowledge and complid with existing ontology. Futhermore, general cross-domain ontologies such as Open Directory Project [4] and Wikipedia are difficult to keep logically consistent due to heterogeneous structures of knowledge conceptualization from different group of domain expert developers. Another issue with ontology engineering is the techniques for automatically performing ontology mapping between documents and concepts. Ontology, in essence, is concern with the classification and categorization of real objects, not only with the concepts themselves. Today, with the exponential growth of the available information on the World Wide Web, ontology engineer has difficulty in meeting efficiency and effectiveness performance demanded by users searching for relevant information under specific concepts. For example, network protocol C.2.2, as a research category in ACM Computing Classification System [1], has been extensively studied and derived more than 70 protocols in different layers of OSI model. While it is dynamically growing biger and biger, finding relevent documents related to certain topics becomes a labored and time-consuming work for users. To overcome these difficulties, in this paper, we propose a new framework for building an Ontology based Browsing and Search System. First, by integrating our previous work KeyConcept[2], we construct the mappings between docu- ments and existing ontology. Then we investigate several unsupervised clustering methods and use them to further split heavy burdened categories. The experimental results on CiteSeer x corpus show that our method can scale on large documents collection and provide a more robust way to construct a browsing System. The remainder of this paper is organized as below. In section 2, we provide a overview of several major approaches for ontology extension and some related work. Section 3 presents the algorithms of our model in details . The experimental results and evaluation are presented in Section 4 , followed by the conclusion and future work in Section 5. 2010 Second International Conference on Knowledge and Systems Engineering 978-0-7695-4213-3/10 $25.00 © 2010 IEEE DOI 10.1109/KSE.2010.32 203