Terminology and the construction of ontology Lee Gillam, Mariam Tariq and Khurshid Ahmad Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, GU2 7XH, United Kingdom {l.gillam, k.ahmad, m.tariq}@surrey.ac.uk Abstract. This paper discusses a method for extracting conceptual hierarchies from arbitrary do- main-specific collections of text. These hierarchies can form a basis for a concept-oriented termi- nology collection, and hence may be used as the basis for developing knowledge-based systems via ontology editors. This reference to ontology is explored in the context of collections of terms. The method presented uses both statistical and linguistic techniques. The result of such an extraction may be useful in information retrieval, knowledge management, or in the discipline of terminology science itself. Keywords. terminology extraction, conceptual hierarchies, knowledge-based systems, ontology 1. Introduction Organization of information is important for all scientific activities. For science to be explored, its phenomena should be both observable and repeatable. Publication of landmark scientific texts, includ- ing the first scientific journal, Philosophical Transactions (by the Royal Society in 1665), and works such as Newton’s Opticks, Darwin’s Origin of Species and so forth, have provided “in-text” organiza- tion of scientific information. Increasing numbers of such publications, and the need to provide better organization of information are doubtless among reasons that we now have grandly named systems which we can use to classify information, such as the Universal Decimal Classification (UDC) and the Lenoch Universal Classification (LUC). Pioneers of these classifications used terms such as universal to refer to all that exists. Both these systems provide a hierarchical structure for organizing informa- tion, although their structural bases differ. These large-scale general classification systems have a particular identifiable problem: they do not keep pace with developments in specific subject fields since they are not easily modified. Though never named as such, these systems of classification represent earlier manifestations of ontology. The Encyclopaedia Britannica (EB) defines ontology as “the theory or study of being as such; i.e., of the basic characteristics of all reality”. Use of the word universal suggests that these systems are providing some theory or study of being, so promoting their ontological status. More recently, “ontology” has been used to describe the (computer-readable) representation of information about the world in a form in which it can be reasoned over. This still provides for a theory or study of being, however the focus now is utilitarian: the term ontology is used extensively in literature on information extraction, Knowledge Representation, and with reference to the Semantic Web, as a thing to be used, rather than a study of things. Use of ontology as representations of the world have led to its consideration as a tool for developing solutions to problems of translation (Navigli, Velardi and Gangemi 2003), information retrieval (Oard 1997, Guarino, Masolo and Vetere 1999), knowledge management (Maedche et al 2003) and other issues related to knowledge-based activities (Alani et al 2003). The creation of any conceptual system, including a subject classification system or the modern-day ontology, still requires significant human effort. Subject experts, information retrieval professionals, and artificial intelligence researchers specify and design such systems largely by hand. These experts bring to bear their experience, documentation and knowledge in building the systems. The knowledge of experts is already documented to a discernible extent. These systems, classifications, terminologies or ontologies, may be subsequently standardized, for example the British Standard (BS 1000) for UDC,