A Framework to Provide Customized Reuse of Open Corpus Content for Adaptive Systems Mostafa Bayomi CNGL Centre for Global Intelligent Content, Knowledge and Data Engineering Group School of Computer Science and Statistics, Trinity College Dublin, Ireland bayomim@scss.tcd.ie ABSTRACT One of the main services that Adaptive Systems offer to their users is the provision of content that is tailored to individual user’s needs. Some Adaptive Systems use a closed corpus content that has been prepared for them a priori, hence, they accept only a narrow field of content. Furthermore, the content is tightly coupled with other parts of the system, which also hinders its re- usability. To address these limitations, recent systems started to make use of open Web content to provide a wider variety of content. Previous approaches have attempted to harness the information available on the web by providing adaptive systems with customizable information objects. Since adaptive systems are evolving towards the Semantic Web and the use of ontologies, existing systems are limited by their ability to service these documents solely through keyword-based queries. In this research we propose a novel framework that extends existing content provision system, Slicepedia. Our framework uses the conceptual representation of content to segment it in a semantic manner. The framework removes unnecessary content from web pages, such as navigation bars, and then semantically reveals the structural representation of text to build a tree-like hierarchy. This tree can be traversed to obtain different levels of content granularity that facilitate content discoverability and adaptivity. Categories and Subject Descriptors H3.3 [Information Search and Retrieval]: Information Filtering; Retrieval Models; Selection Process; H.5.4 [Hypertext/Hypermedia]: Architectures; User Issues; Keywords Open Corpus Content; Semantic Web; Content Semantic Slicing; 1. INTRODUCTION AND MOTIVATION The amount of content on the World Wide Web is continuously growing. Several research fields have emerged that particularly focus on the challenges associated with this growing body of global content. These challenges include: how to identify, handle and retrieve content from different sources; how to search for information in multiple languages; and how to deliver this content in a personalized form that is most suitable for the user. Various systems [6,9,16] have tried to address the challenge of producing adaptive compositions from open information sources in order to deliver content in a form that is most suitable to an individual user. Adaptive systems focus on providing such compositions based on a variety of user dimensions, such as user interests, prior knowledge, preferences or context. At the heart of these systems is the adaptive engine which deals with multiple loosely coupled models that are integrated as desired. Content, however, is still very tightly coupled to these engines and as a result strongly impedes the re-usability of this content. Consider an E-Learning portal as an adaptive system that provides users with learning materials about specific subject. The portal has a user model that is responsible for personalizing content according to different dimensions, such as the user’s interests, prior knowledge, preferences or preferred style of content (concise or detailed content). Since the dimensions are not the same for all users, the portal should: i) have various content resources, ii) be able to provide different levels of content granularity, iii) provide content at low production costs iv) and provide content that is amenable to be reused. Early proposed adaptive systems have relied on closed document corpus with content specifically authored for their usage [7], hence they accept only a narrow field of content. As a result, Open corpus content is increasingly seen as providing a solution to these issues [3]. However, most systems incorporating open corpus content have mainly focused on linking such content with the internal content as a path for more content exploration [9]. Moreover, adaptive systems must strict to specific content structures to be able to make use of it which limits the amenability of content reuse. The one-size-fits-all nature of Web content calls for automatic approaches that can tailor the content in a way that facilitates its reuse – whether in part or in full. This tailoring must be performed based upon various aspects, such as: granularity, content format and associated metadata [11]. Various systems have been proposed to address the challenge of producing adaptive content from open corpus content. These systems focused on separating the content from other models in the adaptive systems (such as domain and user models). Slicepedia [11], for example, was introduced as a service to process open corpus resources and extract content for reuse by right-fitting it to specific content requirements of individual adaptive system. Since adaptive systems are moving towards the Semantic Web and the use of ontologies, Slicepedia and other systems are limited by the ability to service these documents solely through keyword-based queries. This means they only provide limited capabilities to capture the conceptualizations associated with adaptive system needs and content. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. © 2015 ACM. ISBN 978-1-4503-3395-5/15/09$15.00 DOI: http://dx.doi.org/10.1145/2700171.2804450 315 HT '15, September 1–4, 2015, Guzelyurt, Northern Cyprus.