Keywords—IPTV system, multimodality, IMS, speech technologies, Smart TV, intelligent ambience, ECA. Abstract— Several systems with multimodal interfaces are already available, and they allow for a more natural and more advanced exchange of information between man and a machine. Nevertheless, the television domain is still undergoing an innovation/development phase within which standard linear television is further enhanced with several novel technologies. In this way it is already being transformed into a full interactive entertainment environment customizable with several applications and services. Besides, TV set is a most common household device and can, therefore, represent a common platform also for smart-home environment. Current level of personalization and interactive possibilities are still quite limited, especially in terms of context- awareness, recommendation, and multiple user-control-devices (e.g. smart-phones, tablets, game-pads, keyboards, mice, etc.). Therefore, the fusion of evolving IPTV services with natural modalities can be effective solution for users that would like to access these services and IPTV content in a more natural way. In the paper a novel IMS based UMB-SmartTV system is proposed that fuses traditional IPTV services with multimodal services, including text-to-speech synthesis engine, speech recognition engine, and embodied conversational agents, available for several users even remotely. The platform enables flexible migration from often closed and purpose-oriented nature of multimodal systems to the wider scope that IPTV environment can offer. It is designed to overcome problems regarding interoperability, compatibility and integration that often accompany migrations to multiservice (and resource limited) networks. The UMB-SmartTV architecture is developed on IMS core and distributed DATA architecture. In this way it flexibly merges IPTV and non-IPTV services into uniform and highly modular solution that provides entertainment, ambience control, and many other services to the users operating with different devices and speech. I. INTRODUCTION HE Internet Protocol TV (IPTV) systems have evolved from a revolution in digital broadcasting using the Internet Protocol (IP) (linear television) to a highly advanced user- M. I. Author is with Panevropa d.o.o., Maribor 2000 Slovenia (e-mail: izidor@panevropa.com). Z. D. Author is with the Faculty of Electrical Engineering and Computer Sicence, University of Maribor, Maribor 2000 Slovenia (e-mail: danilo.zimsek@uni-mb.si). K. Z. Author is with the Faculty of Electrical Engineering and Computer Sicence, University of Maribor, Maribor 2000 Slovenia (e-mail: kacic@uni- mb.si ). R. M. Author is with the Faculty of Electrical Engineering and Computer Sicence, University of Maribor, Maribor 2000 Slovenia (e-mail: matej.rojc@uni-mb.si). centric and service-oriented interactive platforms [1]. Nowadays, IPTV system may be described as a collection of modern technologies in Information and Communication Technologies (ICT) and other domains converged to deliver a rich set of services and high-quality multimedia (TV, VOD) content over Internet protocol (IP) [2]. Therefore, IPTV systems already provide advanced, customized, and personalized services, with interactivity assumed to be the major difference from traditional media [3]. These services may be accessed and controlled by using different devices ranging from traditional TV remote controllers to advanced mobile devices (smart-phones, tablets, etc.). Nevertheless, with additional applications being integrated into the core of IP-TV, the personalization and a natural way of control are becoming key issues. Namely, in most of the current IPTV solutions the personalization is limited to context-aware personalization through recommendation. For instance, in [4] an algorithm is proposed to recommend users preferred VOD program, available in the IPTV environment. And in [1] an advanced IPTV services personalization model is proposed for context-aware content recommendation. Here, e.g. RFID tags are used for user identification, and each user device is connected with RFID reader indicating the identity of the device, and its association with the physical location. Similarly, in [5] a context-aware based content recommendation system is represented. It provides a personalized EPG applying a client-server approach. Further, in order to combine the technologies into IPTV system, a complex convergence network is required. The evolution of telecommunications combined with the ETSI/TISPAN [6] provides the Next Generation Network (NGN) architecture for integration of communication and interactive IPTV services into a single system. IP Multimedia Subsystem (IMS) [7] is nowadays already recognized standard for the development of IPTV platforms. Namely, IMS can be used to perform tasks related to virtualization, interoperability, subscription, billing, roaming and security etc. Therefore, deployment of IPTV system based on the IMS architecture is a compelling alternative to the proprietary commercial implementations [8]. Further, the IMS architecture allows implementation of services that have a lot of potential to greatly enhance the IPTV experience and extend its capabilities into a smart-home platform. With the development of NGN architecture and ubiquitous A Novel IMS based UMB-SmartTV system for Integrating Multimodal Technologies Izidor Mlakar, Danilo Zimšek, Zdravko Kačič, Matej Rojc T INTERNATIONAL JOURNAL OF COMPUTERS AND COMMUNICATIONS Volume 8, 2014 ISSN: 2074-1294 7