Dynamic Eye Gaze Prediction for Foveated Rendering and Retinal Blur Kushagr Gupta, Suleman Kazi, Terry Kong Abstract— This project deals with tracking eye gaze location and predicting future eye movement sequences using classical machine learning and Hidden Markov Models, respectively, for users wearing head mounted displays (HMD). Eye gaze location can enhance the virtual reality experience inside an HMD in two ways: foveated rendering and retinal blur. Both of these techniques require accurate eye gaze location predictions to prepare the next frame in a video. The eye gaze location prediction can be accomplished by ﬁrst tracking and estimating the present eye gaze location and then using the past locations to predict the future locations. This is a joint project done to satisfy CS229’s and CS221’s course project. The work shown in this report explains different approaches using Hidden Markov Models to predict future eye gaze locations once the present location has been estimated via our tracking approach. The approach to track the eye gaze location constitutes the CS229 portion of this project and is summarized brieﬂy at the beginning to connect the two projects. For a much more in depth analysis of the tracking approach, we refer the reader to the CS229 analog of this report. I. INTRODUCTION In this project we perform eye gaze tracking using standard machine learning techniques (regularized linear regression, SVR) and then predict the future eye gaze location using techniques like Hidden Markov Models. The ability to pre- dict eye gaze accurately can have a huge impact in Virtual Reality (VR) Headsets since it gives the developer ﬂexibility to enable foveated rendering or retinal blur to enhance content and have an immersive VR experience. Eye gaze prediction also answers the question of what content to adjust when the user is blinking. Both retinal blur and foveated rendering require high accuracy (so that the scene is rendered and blurred at the right locations) and low latency (so that the content is updated perceptually in real-time). Current state of the art is capable of high accuracy, but latency is still a huge bottleneck, especially in foveated rendering. Our approach to eye gaze tracking and prediction differs from traditional signal processing intensive methods, such as the ones that use Kalman ﬁlters. II. PROBLEM SETUP It has been studied that eye motion can be modeled into three primary states:ﬁxation, pursuit and saccades as described in [1]. In ﬁxation (F), a person is looking at a stationary object for a prolonged time interval and hence the eye gaze velocity is almost negligible. In pursuit (P), eyes are tracking an object slowly in a scene with some velocity in a deterministic manner. In saccade (S), motion of the eye is erratic and often in a straight line, but the velocities are extremely large and there is no content consumption i.e. brain doesn’t process any information during a saccade. For the tracking stage, we rely on the image of the eye and do not need to know the eye movement. But when we want to predict the future eye gaze location, which is the scope of this project it becomes essential to model the eye motion. The approach followed in [1] uses an HMM to predict the future eye gaze state with the hidden variable being F, P or S. It then uses linear prediction to estimate the eye gaze location. The approach taken in this project is different as we use the inherent eye motion model but want to predict the eye gaze itself using the Hidden Markov Models without using the linear predictor with the expectation being that the approach would be more generic. We use two different approaches to model our hidden variable and compare the results with a kalman ﬁlter implementation and ﬁnite difference models. III. TRACKING APPROACH The eye gaze can be characterized by an x and a y location. We use various techniques ranging from linear regression to support vector regression to neural networks to track the eye gaze and they are outlined brieﬂy here as they are part of the Machine Learning Project. But the output of this pipeline is the eye gaze locations and this forms an input to the AI project. We needed to generate a dataset for the tracking approach because we could not ﬁnd a dataset that contained labeled images of the eye from a side-view. However, this dataset was not used to evaluate the prediction approach because the images of the eye were taken seconds apart and so no motion of the eye was captured. Thus, we used a popular dataset for attention prediction [2] to evaluate our prediction models. While there is an oracle for the dataset of the tracking approach, there is not one for the dataset of the prediction approach. This is because it is not possible to collect ground truth eye motion because any collection method uses an eye tracker (or sensor) that will have inherent noise. Hence, we have treated the data from the [2] paper as the oracle when we evaluate our error metrics. This assumption is made because we believed their eye tracker, with an error of 0.5 ◦ of the viewing angle, was sufﬁciently accurate. A. LEAST SQUARES WITH REGULARIZATION Least squares was used as a baseline for eye gaze tracking. In least squares, the x direction is estimated independently of the y location using the following model p = Xw where p is the eye gaze location, X is the design matrix and w is the weight vector. For regularized linear regression, the objective function is the L2 norm: J (w)= ‖p − Xw‖ 2 2 + µ‖w‖ 2 2