Multi-Modal Ontology that Enhances Visual
Content Retrieval System
Stefan Poslad
School of Electrical and Electronic Engineering and Computer Science,
Queen Mary University of London, E1 4NS, United Kingdom
Email: stefan@eecs.qmul.ac.uk
Kraisak Kesorn
Computer Science and Information Technology Department,
Science Faculty, Naresuan University, Phitsanulok, 65000, Thailand
Email: kraisakk@nu.ac.th
Abstract—This paper presents an ontology-based system for
multi-modal image annotation. A novel technique is
proposed to represent the semantics of visual content by
restructuring visual word vectors to an ontology model from
computing a distance between the visual word features and
concept features. The second index relies on textual
description which is processed to extract and recognise
concepts, properties, or instances in an ontology. The two
indexes are unified in to a single indexing model used to
enhance the image retrieval efficiency. As a result, it is
possible to retrieve images with a query using words that do
not appear in the caption. The constructed KB was
evaluated how well it fits that knowledge domain regarding
to its relevance for the application. The results show that the
metadata in the presented KB could be exploited efficiently
and, thus, it enhances the retrieval performance.
Index Terms—multimodal information, knowledge base,
ontology model, image retrieval system
I. INTRODUCTION
Ontology provides a useful way for formalising the
semantics of the represented information. In principle, an
ontology can actually be the semantic representation for
an information system in a concrete and useful manner
[1]. For an image retrieval system (IMR), ontologies are
used for reducing the semantic gap, the gap between the
user perception and the low-level feature abstraction from
the visual content, by storing the knowledge structures for
summarising, discovering, classifying, browsing and
retrieving, and annotating images. Ontology-based
frameworks are proposed for IMR in numerous
collections [2]-[4]. These frameworks have validated the
assumption that ontologies could help improve
information retrieval effectiveness by making it possible
to find relevant documents that are syntactically not
similar to the query terms.
Existing works on IMR have been done based only on
single-modality information either textual information or
visual features. Consequently, those works suffer from
Manuscript received July 25, 2013; revised September 10, 2013.
several limitations. For example, the system is not able to
describe the high-level semantics of images based only
on any distinctive low-level visual features when text
descriptions of images are not supplied. This is because
the extracted visual features themselves cannot be used to
represent the content of images effectively. Text and
image are two distinct types of information from different
modalities, as they represent ‘things’ in different ways.
However, there are some invariant and implicit
connections between textual and visual information [5].
As such, using single-modality information is not
adequate to enhance the interpretation power for IMR.
Multimodality information should be utilised to facilitate
image interpretation, classification and retrieval.
The combination of textual information with image
features information has been proposed to improve image
search results. Wang et al. [6] supported multi-modal,
text and visual information, in the canine domain. Binary
histogram is used to represent each of the image features
and transformed into ontology model using a hierarchical
SVM classification [7] and incorporate with the
aforementioned textual description ontology. The
proposed method is able to increase classification
accuracy and retrieval performance. Khalid et al.[8]
proposed multimodality ontology framework for sport
domain. Textual descriptions and surrounding text are
extracted and then are manually mapped to concepts in a
domain knowledge base. For visual content, low-level
features, e.g. colour layout, dominant colour, and edge
histogram descriptors, are extracted. These visual features
are then classified into categories using a SVM
classification technique and a framework, Label Me
Annotation Toolbox [9]. Nonetheless, using global
features e.g. colour and edge information cannot
represent the semantic of images effectively. For example,
the same images have different brightness, size, and
camera angle, so called visual heterogeneity problem and
this is a well-recognised problem among researchers in
image-processing area.
From the analysis of the existing solutions, some
limitations still exist as they are unable to handle visual
284
International Journal of Electrical Energy, Vol. 1, No. 4, December 2013
©2013 Engineering and Technology Publishing
doi: 10.12720/ijoee.1.4.284-290