Abstract—Today’s major problem in consumption of multimedia content from the Web is the extremely large volume of multimedia content in various forms on the Web, which keeps on rapidly growing. Another problem is that one part of that multimedia content is not annotated; therefore it is very hard to find and reuse such content. The other part of multimedia content is described manually, hence those annotations may be too subjective or inaccurate, and may be lacking in formal semantics. This results in the need for efficient semantic annotation, so that computers and applications can easily process those metadata for reuse and retrieval of multimedia content. This paper presents ontologies in general as part of Semantic Web and specific ontologies used for multimedia annotation. Comparison of the most commonly used multimedia ontologies and their main features is provided in this paper. These multimedia ontologies can be used for creating high quality and semantically rich multimedia annotations. Keywords—metadata, multimedia ontologies, ontology, Ontology Design Patterns, OWL, semantic annotation, Semantic Web I. INTRODUCTION ULTIMEDIA content in all forms is every day taking more and more place in the web-available content. Most common types of multimedia content on the Web are images and video, but it can also be in form of 3D graphics, audio and audiovisual files. Besides of the consumption of multimedia content on the Web there is also a progressively increasing trend in amateur and professional production, which includes publishing that multimedia content on various User Generated Content (UGC) web sites, like Picasa, Flickr and YouTube [1]. Those sites do not enforce their users to make metadata definitions and to perform classification operations when uploading their multimedia content. With that large expansion of multimedia content on the Web, the need for indexing and annotating that content for efficiently use, reuse and retrieval of such content has occurred. Multimedia content is annotated with metadata which adds T. Sjekavica is with the Department of Electrical Engineering and Computing, University of Dubrovnik, Cira Carica 4, 20000 Dubrovnik, CROATIA (phone: +385 (0) 20 445 793; e-mail: tomo.sjekavica@unidu.hr). I. Obradović is with the Department of Electrical Engineering and Computing, University of Dubrovnik, Cira Carica 4, 20000 Dubrovnik, CROATIA (e-mail: ines.obradovic@unidu.hr). G. Gledec is with the University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, CROATIA (e- mail: gordan.gledec@fer.hr). additional value for that content. First type of multimedia metadata was plain text usually entered manually, which is time consuming and costly process. That kind of metadata is easily readable to humans, but computers can hardly process those metadata due to lack of formal semantics. Today a lot of different multimedia metadata standards and formats exist, like Exif, Dublin Core, VRA Core, DIG-35 and MPEG-7 that are not mutually compatible. MPEG-7 [2] is an international ISO/IEC multimedia content description standard that supports some degree of interpretation of information meaning, which can be processed by applications and computers, instead of just presenting information to the people. It is used for metadata of audiovisual content that can be in form of still pictures, graphics, 3D models, speech, audio or video. In order to enable better retrieval, discovery and exploitation of multimedia content on the Web by web services and applications there is a need for semantic annotation of multimedia content. In order to achieve semantically rich annotations, the use of Semantic Web is required [3]. Semantic Web is an extension of the World Wide Web in which information is given well-defined meaning that enables better cooperation of computers and humans [4]. For semantic annotation of multimedia content Semantic Web technologies like XML, RDF and ontologies can be used. The common vocabulary representing shared knowledge within a specific domain can be defined with ontologies using final list of terms and concepts [5]. For humans, ontologies provide better access to information defined in ontology. Definitions of terms and concepts, as well as the relationships between them should enable better processing by applications and computers. Although several vocabularies that can be used for semantic annotation of multimedia exist, they aren’t rich enough or suitable for describing multimedia content for the use on the Semantic Web. Thus there is a need for development of extended, multimedia enriched ontologies, also known as multimedia ontologies. This paper is organized as follows. Next section deals with ontologies in general and ontologies as part of the Semantic Web. An overview of ontology languages on the Web is provided in third section. Main Ontology Design Patterns are described in fourth section. Multimedia ontologies most commonly used for semantic annotation are shown in the fifth section. These selected multimedia ontologies are then Semantic Annotation and Retrieval using Multimedia Ontologies Tomo Sjekavica, Ines Obradović, and Gordan Gledec M INTERNATIONAL JOURNAL OF COMPUTERS AND COMMUNICATIONS Volume 8, 2014 ISSN: 2074-1294 140