On the Energy Efficiency of Parallel Multi-core vs Hardware Accelerated HD Video Decoding Yahia Benmoussa Univ. Bretagne Occidentale, UMR6285, Lab-STICC, France. Univ. M’hamed Bougara, LIMOSE, Algeria yahia.benmoussa@univ- brest.fr Jalil Boukhobza Univ. Bretagne Occidentale, UMR6285, Lab-STICC, F29200 Brest, France jalil.boukhobza@univ- brest.fr Eric Senn Univ. Bretagne Sud, UMR6285, Lab-STICC, F56100 Lorient, France eric.senn@univ-ubs.fr Djamel Benazzouz Univ. M’hamed Bougara, LMSS, Boumerdes, Algeria dbenazzouz@yahoo.fr ABSTRACT Hardware video accelerators are used on mobile devices to provide support for energy efficient real time High definition (HD) video decoding. Recently, the rise of multi-core archi- tectures on those devices increased their performances and make real time HD video decoding possible using parallel processing on the GPP cores only. What is even more in- teresting to know is the level of energy efficiency these kind of multi-core General Purpuse Processor (GPP) can achieve as compared to hardware video accelerators. In this paper, we propose an experimental evaluation of the energy effi- ciency of the two video decoding approaches. An accurate energy measurement was achieved on a recent low-power 40 nm mobile SoC containing a quad-core ARM processors and a video hardware accelerator. The results show that parallel multi-core HD decoding enhances both the performance and the energy efficiency as compared to the use of a single core. However, the hardware accelerated decoding is about three times more energy efficient. Based on the experimental ob- servations, some challenges for enhancing parallel multi-core video decoding energy efficiency are pointed out. Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentation]: Mul- timedia Information Systems; D.4.8 [Operating Systems]: Performance; C.3 [Special Purpose and Application Based Systems]: Real-time and embedded systems Keywords Parallel video decoding, Energy efficiency, multi-core SoC EWiLi’14, 13-14 November 2014, Lisbon, Portugal. Copyright retained by the authors. 1. INTRODUCTION Video decoding is both processing intensive and real time application. To fulfill these constraints, the processor equip- ping the mobile devices may need to run at more and more high frequency especially in the context of an increasing de- mand on HD videos. However, due to the thermal and power issues faced in the design of modern microprocessors, it is no longer possible to increase continuously the clock frequency. In fact, using high frequencies leads to a drastic increase in the thermal dissipa- tion and the energy consumption due to the quadratic rela- tion between the dynamic power consumption and the clock frequency. This is more critical in the case of energy con- strained mobile devices such as smartphones and tablets. To overcome this issue, modern embedded processor architec- tures use the parallelism to increase the performance with- out the need to increase the frequency [9]. In the field of video decoding, the parallelism can enhance the energy efficiency on the energy constrained device. It can be implemented in a specialized hardware video accelerators whose energy efficiency is well established [18]. However, the hardware accelerators are a proprietary solutions and lack of flexibility. In fact, they are not open and their use depend on some API provided by the vendor. Moreover, it may take a long time to implement a new video standard on hardware circuits unlike the software based solutions running on GPP. For example, the latest mobile device still does not support hardware accelerator for the new HEVC standard. Recently, the new SoC equipping mobile devices include more and more GPP cores. For example, the latest ARM big.LITTLE architecture contains four Cortex A7 and four Cortex A15 processors [3]. What is even more interesting to know is the level of performance and energy efficiency these kind of multi-core GPP processors can achieve as compared to hardware video accelerators. The objective is to provide a video decoding solution that conciliates both the energy efficiency and the flexibility of video decoding. In this study, we investigate the performance and energy efficiency of parallel multi-core video decoding as compared to the hardware accelerator based approach. For this pur- pose, we propose an experimental methodology based on