IDENTIFYING ONTOLOGY COMPONENTS FROM DIGITAL ARCHIVES FOR THE SEMANTIC WEB JRG Pulido R Herrera M Ar´ echiga A Block § R Acosta S Legrand ABSTRACT This paper describes an approach for identifying On- tology components by using Self-Organizing Maps (SOM). Our system represents the knowledge con- tained in a particular domain, any kind of digital archive, by assembling and displaying its ontology components. This novel approach provides a solu- tion to the problem of semi-automatic ontology con- struction, supports mechanisms that explore domains, and allows knowledge components to be displayed in a browsable manner. Further processing may be carried out on the extracted knowledge to be embedded on the semantic web for software agents to use. KEY WORDS Semantic web, ontology learning, self-organizing maps. 1 Introduction It is known that the web contains several billion of static pages connected by hyperlinks [26, 29]. Reach- ing them is a gigantic challenge having into account that current search engines only contain a small per- centage of the total of documents in the web. Further- more, this small amount of reachable documents is in an unstructured way, meaning that software agents un- derstand actually nothing about the actual content of them. In other words, these documents can be read but not undestood [3]. It would be useful to develop representations of the information contained in digi- tal archives and create intelligent systems supporting interactive searching. In this paper we describe an ap- proach for helping in the semi-automatic construction of ontologies for such web sites. The remainder of this paper is organized as follows. In section 2 some re- lated work is introduced. Our approach is outlined in section 3. Results are presented in section 4, and conclusions and further work in section 5. Faculty of Telematics, University of Colima, exico, jrgp@ucol.mx SIABUC Dept, University of Colima, exico, rherrera@ucol.mx Faculty of Telematics, University of Colima, exico, mandrad@ucol.mx § Faculty of Telematics, University of Colima, exico, arted@ucol.mx Faculty of Telematics, University of Colima, exico, acosta@ucol.mx Computer Science School, University of Jyv¨ askyl¨ a, Finland, steveleg@gmail.com 2 Related Work One of the most important challenges that the seman- tic web poses in dealing with large amounts of on-line knowledge is the mapping of unstructured information, suitable for humans, to formal representation of knowl- edge [5]. In the next subsections we have a brief look at some work done on Ontologies as well as Semantic Maps. 2.1 Constructing Ontologies A representation that brings order and structure to a web site can be referred to as an Ontology. Repre- senting knowledge about a domain as an ontology is a challenging process which is difficult to achieve in a consistent and rigorous way. It is easy to lose consis- tency and to introduce ambiguity and confusion [4]. An important observation in this context is that there is a significant manual effort involved in translating ontologies [27]. Nevertheless, ontologies are a useful form of knowledge representation which may be used to support the design and development of intelligent software applications and expert systems. Web on- tologies can take rather different forms to traditional ones. New approaches, including advanced ontology languages have been proposed, such as OIL, DAML, OWL [2, 15, 10, 14, 8]. In [13] the use of the so-called Simple HTML Ontology Extension (SHOE) in a real world internet application is described. This approach allows authors to add semantic content to web pages, relating the context to common ontologies that provide contextual information about the domain. A similar approach is presented in [1]. Most tag-annotated web pages tend to categorize concepts, therefore there is no need for complex inference rules to perform auto- matic classification. One of the most exciting uses of an ontology, in the context of the semantic web, is to support the development of agent-based systems for web searching [9, 21]. 2.2 Semantic Map Systems An interesting project is presented in [18], where the results of applying the WEBSOM2, a document or- ganization, searching and browsing system, to a set of about 7 million electronic patent abstracts is de- scribed. In this case, a document map is presented as 505-019 7