Adaptive High Efficiency Video Coding Based on Camera Activity Classification Gangadharan Esakki, Venkatesh Jatla,and Marios S. Pattichis image and video Processing and Communications Lab (ivPCL) Department of Electrical and Computer Engineering The University of New Mexico, United States gesakki,venkatesh369,pattichi @unm.edu Abstract We present a framework for adaptive video encoding based on video content. The basic idea is to analyze the video to determine camera activity (tracking, stationary, or zooming) and then associate each activity with adaptive video quality constraints. We demonstrate our approach on the UT LIVE video quality assessment database. We show that effective camera activity detection and classification is possible based on the motion vectors and the number of prediction units used in the HEVC standard. In our results, by applying leave- one-out validation, we get an 79% correct classification rate. We also present two examples for real-time, high-quality video encoding achieving bitrate savings of 35% and 51.5%. 1 Introduction The current paper considers an adaptive encoding framework for effective video com- munications. Our goal is to automatically detect different video activities and as- sociate quality constraints based on a specific task. Thus, in effect, we effectively compress the video for specific tasks that can be adjusted by the users or the owners of the video content. To begin with, we note that video quality assessment is an area of active research as discussed in [1–4]. In our case, we consider a simple and fast method for assessing image quality based on SSIM as discussed in [5]. Furthermore, our approach is moti- vated by the well-known fact that visual attention is task dependent as documented in early research reported in [6] and also more recently in [7]. While viewers can have very different tasks that they are interested in, many times, it is possible to identify the goal of the photographer by analyzing the video content itself. In our approach, we identify video segments where the camera is moving, zooming, or held stationary and adaptively encode the video based on the content of each segment. For example, we interpret a camera zooming operation as an obvious attempt by the photographer to draw attention to his or her subject. As a result, we associate camera zooming with the need to encode the video at a higher video quality level. On the other hand, camera motions can be more difficult to interpret. If we associate camera motions as a search operation for obvious targets, then video quality can be lower than level used during zooming. On the other hand, if the camera motion is used to draw attention to the activity, we would expect higher video quality to visualize what is happening (e.g., in sports events). Thus, our focus is to provide