Multi-Modal Ontology that Enhances Visual Content Retrieval System Stefan Poslad School of Electrical and Electronic Engineering and Computer Science, Queen Mary University of London, E1 4NS, United Kingdom Email: stefan@eecs.qmul.ac.uk Kraisak Kesorn Computer Science and Information Technology Department, Science Faculty, Naresuan University, Phitsanulok, 65000, Thailand Email: kraisakk@nu.ac.th Abstract—This paper presents an ontology-based system for multi-modal image annotation. A novel technique is proposed to represent the semantics of visual content by restructuring visual word vectors to an ontology model from computing a distance between the visual word features and concept features. The second index relies on textual description which is processed to extract and recognise concepts, properties, or instances in an ontology. The two indexes are unified in to a single indexing model used to enhance the image retrieval efficiency. As a result, it is possible to retrieve images with a query using words that do not appear in the caption. The constructed KB was evaluated how well it fits that knowledge domain regarding to its relevance for the application. The results show that the metadata in the presented KB could be exploited efficiently and, thus, it enhances the retrieval performance. Index Terms—multimodal information, knowledge base, ontology model, image retrieval system I. INTRODUCTION Ontology provides a useful way for formalising the semantics of the represented information. In principle, an ontology can actually be the semantic representation for an information system in a concrete and useful manner [1]. For an image retrieval system (IMR), ontologies are used for reducing the semantic gap, the gap between the user perception and the low-level feature abstraction from the visual content, by storing the knowledge structures for summarising, discovering, classifying, browsing and retrieving, and annotating images. Ontology-based frameworks are proposed for IMR in numerous collections [2]-[4]. These frameworks have validated the assumption that ontologies could help improve information retrieval effectiveness by making it possible to find relevant documents that are syntactically not similar to the query terms. Existing works on IMR have been done based only on single-modality information either textual information or visual features. Consequently, those works suffer from Manuscript received July 25, 2013; revised September 10, 2013. several limitations. For example, the system is not able to describe the high-level semantics of images based only on any distinctive low-level visual features when text descriptions of images are not supplied. This is because the extracted visual features themselves cannot be used to represent the content of images effectively. Text and image are two distinct types of information from different modalities, as they represent ‘things’ in different ways. However, there are some invariant and implicit connections between textual and visual information [5]. As such, using single-modality information is not adequate to enhance the interpretation power for IMR. Multimodality information should be utilised to facilitate image interpretation, classification and retrieval. The combination of textual information with image features information has been proposed to improve image search results. Wang et al. [6] supported multi-modal, text and visual information, in the canine domain. Binary histogram is used to represent each of the image features and transformed into ontology model using a hierarchical SVM classification [7] and incorporate with the aforementioned textual description ontology. The proposed method is able to increase classification accuracy and retrieval performance. Khalid et al.[8] proposed multimodality ontology framework for sport domain. Textual descriptions and surrounding text are extracted and then are manually mapped to concepts in a domain knowledge base. For visual content, low-level features, e.g. colour layout, dominant colour, and edge histogram descriptors, are extracted. These visual features are then classified into categories using a SVM classification technique and a framework, Label Me Annotation Toolbox [9]. Nonetheless, using global features e.g. colour and edge information cannot represent the semantic of images effectively. For example, the same images have different brightness, size, and camera angle, so called visual heterogeneity problem and this is a well-recognised problem among researchers in image-processing area. From the analysis of the existing solutions, some limitations still exist as they are unable to handle visual 284 International Journal of Electrical Energy, Vol. 1, No. 4, December 2013 ©2013 Engineering and Technology Publishing doi: 10.12720/ijoee.1.4.284-290