A Framework to Provide Customized Reuse of Open
Corpus Content for Adaptive Systems
Mostafa Bayomi
CNGL Centre for Global Intelligent Content, Knowledge and Data Engineering Group
School of Computer Science and Statistics, Trinity College Dublin, Ireland
bayomim@scss.tcd.ie
ABSTRACT
One of the main services that Adaptive Systems offer to their
users is the provision of content that is tailored to individual
user’s needs. Some Adaptive Systems use a closed corpus content
that has been prepared for them a priori, hence, they accept only a
narrow field of content. Furthermore, the content is tightly
coupled with other parts of the system, which also hinders its re-
usability. To address these limitations, recent systems started to
make use of open Web content to provide a wider variety of
content. Previous approaches have attempted to harness the
information available on the web by providing adaptive systems
with customizable information objects. Since adaptive systems
are evolving towards the Semantic Web and the use of ontologies,
existing systems are limited by their ability to service these
documents solely through keyword-based queries. In this research
we propose a novel framework that extends existing content
provision system, Slicepedia. Our framework uses the conceptual
representation of content to segment it in a semantic manner. The
framework removes unnecessary content from web pages, such as
navigation bars, and then semantically reveals the structural
representation of text to build a tree-like hierarchy. This tree can
be traversed to obtain different levels of content granularity that
facilitate content discoverability and adaptivity.
Categories and Subject Descriptors
H3.3 [Information Search and Retrieval]: Information Filtering;
Retrieval Models; Selection Process;
H.5.4 [Hypertext/Hypermedia]: Architectures; User Issues;
Keywords
Open Corpus Content; Semantic Web; Content Semantic Slicing;
1. INTRODUCTION AND MOTIVATION
The amount of content on the World Wide Web is continuously
growing. Several research fields have emerged that particularly
focus on the challenges associated with this growing body of
global content. These challenges include: how to identify, handle
and retrieve content from different sources; how to search for
information in multiple languages; and how to deliver this content
in a personalized form that is most suitable for the user.
Various systems [6,9,16] have tried to address the challenge of
producing adaptive compositions from open information sources
in order to deliver content in a form that is most suitable to an
individual user. Adaptive systems focus on providing such
compositions based on a variety of user dimensions, such as user
interests, prior knowledge, preferences or context. At the heart of
these systems is the adaptive engine which deals with multiple
loosely coupled models that are integrated as desired. Content,
however, is still very tightly coupled to these engines and as a
result strongly impedes the re-usability of this content.
Consider an E-Learning portal as an adaptive system that provides
users with learning materials about specific subject. The portal
has a user model that is responsible for personalizing content
according to different dimensions, such as the user’s interests,
prior knowledge, preferences or preferred style of content
(concise or detailed content). Since the dimensions are not the
same for all users, the portal should: i) have various content
resources, ii) be able to provide different levels of content
granularity, iii) provide content at low production costs iv) and
provide content that is amenable to be reused.
Early proposed adaptive systems have relied on closed document
corpus with content specifically authored for their usage [7],
hence they accept only a narrow field of content. As a result,
Open corpus content is increasingly seen as providing a solution
to these issues [3]. However, most systems incorporating open
corpus content have mainly focused on linking such content with
the internal content as a path for more content exploration [9].
Moreover, adaptive systems must strict to specific content
structures to be able to make use of it which limits the
amenability of content reuse.
The one-size-fits-all nature of Web content calls for automatic
approaches that can tailor the content in a way that facilitates its
reuse – whether in part or in full. This tailoring must be
performed based upon various aspects, such as: granularity,
content format and associated metadata [11]. Various systems
have been proposed to address the challenge of producing
adaptive content from open corpus content. These systems
focused on separating the content from other models in the
adaptive systems (such as domain and user models). Slicepedia
[11], for example, was introduced as a service to process open
corpus resources and extract content for reuse by right-fitting it to
specific content requirements of individual adaptive system.
Since adaptive systems are moving towards the Semantic Web
and the use of ontologies, Slicepedia and other systems are
limited by the ability to service these documents solely through
keyword-based queries. This means they only provide limited
capabilities to capture the conceptualizations associated with
adaptive system needs and content.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
Permissions@acm.org.
© 2015 ACM. ISBN 978-1-4503-3395-5/15/09$15.00
DOI: http://dx.doi.org/10.1145/2700171.2804450
315
HT '15, September 1–4, 2015, Guzelyurt, Northern Cyprus.