131 Text analysis module of a System for Automatic eXtraction of lEarning object Features (SAXEF) Marco Alfano Anghelos Centre on Communication Studies Via Pirandello 40 90144 Palermo, Italy Tel. +39091341791 marco.alfano@anghelos.org Biagio Lenzitti Computer Science University of Palermo Via Archirafi 34 90123 Palermo, Italy Tel. +390916040427 lenzitti@math.unipa.it Natalina Visalli SISSIS University of Palermo Viale delle Scienze 90100 Palermo, Italy Tel. +390916040427 vnatalin@neomedia.it ABSTRACT New on-line courses are often created by using existing learning objects found on the net. However, those learning objects cannot easily be reused for the creation of a new didactic work because they are usually proposed without information on their aims and the typology of users which they are destined to. Moreover, the contents are not clearly synthesized so that the reading of the whole object is often necessary to understand its relevance to the new course. To facilitate this task, we have created a system called SAXEF (System for Automatic eXtraction of lEearning object Features) which allows to automatically extract the basic indicators of any learning object (a sort of DNA) found in Internet. It provides a valuable help to a teacher who is in the process of creating a new on-line course because he/she can easily choose the most appropriate learning objects from the net just by looking at their basic indicators. SAXEF presents a modular structure and we have already developed some modules and are in the process of implementing the rest of the system. This paper presents the main architecture of SAXEF and the details of the text analysis module for extracting main and secondary topics of a learning object. Categories and Subject Descriptors J [Computer Applications] General Terms Design, Experimentation. Keywords On line Education, Learning Objects, Internet, Metadata Extraction, Text Analysis. 1. INTRODUCTION Many on-line “learning objects” (LO) are nowadays available on the net. They satisfy various formation requirements, from the scholastic one (mainly courses aimed to university and post- university formation) to the professional one (basic formation or update courses) and cultural one (courses given by public and private institutes). The proposals can be distinguished for the typology of the content presentation (text, multimedia, etc.), the length and level of details (from the single monothematic lesson to the whole multidisciplinary course), and the interactivity degree (depending upon the interactivity level at user disposal). Moreover, some assume that the student works alone along his/her learning path while others assume an interaction with a tutor (in a synchronous or asynchronous way) [1], [2]. The search of a specific topic in Internet provides a lot of information and much of this information has a didactic structure [3], [4], [5]. This suggests the possibility of their reuse for the creation of a new didactic work [6], [7]. However, the found learning objects cannot easily be reused because they are usually proposed without information on their aims and the typology of users which they are destined to. Moreover, the contents are not clearly synthesized so that the analysis of the whole object is often necessary to understand its relevance to the new course [8], [9]. An help to on-line courses development could come by automatically extracting the main characteristics of existing learning object features for an easier reuse in a new on-line course [10]. The learning objects should be characterized by their contents, communication methodology and required pre-existing knowledge. Moreover, in accordance with the hypertext peculiarity of Internet, they should be linked to each other allowing to retrieve other objects for the full comprehension of the treated subject and its deeper analysis [11], [12]. For example, it would be important to recognize which context a learning object belongs to, evaluate whether its content is either theoretical or practical, synthetic or analytical, to understand what are the main and secondary topics, the level of complexity and the iper/multimedia structure. Such characteristics would also allow to connect those objects to other learning objects [13], [14]. Considering already existing objects, we have thought how to extract characteristics from a complex structure such as that of a learning object without an additional participation of the author who could characterize it through, for example, metadata [15], [16]. This is a fundamental step because we assume that the information obtained by the analysis of the LO components and the study of their relationships allows us to characterize the learning objects through a map, a sort of DNA, that contains the generic and specific elements and totally describes the object. Starting from this hypothesis, we have developed the architecture of a system called SAXEF [17], [18] (System for Automatic eXtraction of lEearning object Features) which allows to automatically extract the basic indicators of a learning object (the sort of DNA previously discussed) and provides a valuable help to a teacher that has to create a new on-line course. 3rd E-Learning Conference Coimbra, Portugal, 7 – 8 September 2006