A PROTOTYPE FOR KNOWLEDGE EXTRACTION FROM SEMANTIC WEB BASED ON ONTOLOGICAL COMPONENTS CONSTRUCTION Nesrine Ben Mustapha, Hajer Baazaoui-Zghal Riadi Laboratory., ENSI Campus Universitaire de la Manouba, 2010 Tunis, Tunisie Marie-Aude Aufaure Supelec, Computer Science Department, Plateau du Moulon, 91 192 Gif sur Yvette, France Keywords: Semantic web, ontology construction, knowledge extraction, OntoCoSemWeb (Ontology Construction for the Semantic Web) prototype. Abstract: Adding a semantic dimension to web pages is a response to some problems of the present web and is known as the semantic web. Many methods and methodologies can be found in the literature. Generally, they are dedicated to particular data types like text, semi-structured data, relational data, etc. This paper presents a prototype for knowledge extraction from web pages based on ontological components construction. Our work deals with web pages. We will first study the state of the art of methodologies defined to learn ontologies from texts. Then, we will define architecture of ontological components for the Semantic web. An implementation and experimentation of the proposed architecture are presented. 1 INTRODUCTION The volume of available information on the web is growing exponentially. Consequently, integration of heterogeneous data sources and information retrieval, have become more and more complex. Adding a semantic dimension to web pages is a response to this problem and is known as the semantic web (Berners-Lee, 2001). Ontologies can be seen as a fundamental part of the semantic web. They can be defined as an explicit, formal specification of a shared conceptualization (Gruber, 1993). Meanwhile, building ontology manually is a long and tedious task. We are interesting in learning ontologies from text. We present in section 2 semantic web and ontological components and our approach to build a domain ontology. In section 3 and 4, implementation and experimentation are presented. Section 5 analyses the results. At last, we conclude and give some perspectives for this work. 2 SEMANTIC WEB AND ONTOLOGICAL COMPONENTS Starting from the state of the art, we propose a hybrid approach to build domain ontology; our objective is to increase the capability of this ontology to specify and extract web knowledge in order to contribute to the semantic web. Analyzing the web content is a difficult task relative to relevance, redundancies and incoherencies of web structures and information. For these reasons, proposing an approach to build automatically an ontology still remains utopian. Our approach is based on the cyclic relation between web mining, semantic web and ontology building as stated in (Berendt and al., 2002). Our proposal is based on the following statements: (1) satisfy the fact that the ontology is useful to specify and extract knowledge from the web, (2) link the semantic content within the web documents structure, and (3) combine linguistic and learning techniques taking into account the scalability and the evolution of the 451 Ben Mustapha N., Baazaoui-Zghal H. and Aufaure M. (2007). A PROTOTYPE FOR KNOWLEDGE EXTRACTION FROM SEMANTIC WEB BASED ON ONTOLOGICAL COMPONENTS CONSTRUCTION. In Proceedings of the Third International Conference on Web Information Systems and Technologies - Web Interfaces and Applications, pages 451-454 DOI: 10.5220/0001285304510454 Copyright c  SciTePress