474 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 3, JUNE 2011 A Multi-Gesture Interaction System Using a 3-D Iris Disk Model for Gaze Estimation and an Active Appearance Model for 3-D Hand Pointing Michael J. Reale, Student Member, IEEE, Shaun Canavan, Student Member, IEEE, Lijun Yin, Senior Member, IEEE, Kaoning Hu, and Terry Hung Abstract—In this paper, we present a vision-based human–com- puter interaction system, which integrates control components using multiple gestures, including eye gaze, head pose, hand pointing, and mouth motions. To track head, eye, and mouth move- ments, we present a two-camera system that detects the face from a fixed, wide-angle camera, estimates a rough location for the eye region using an eye detector based on topographic features, and directs another active pan-tilt-zoom camera to focus in on this eye region. We also propose a novel eye gaze estimation approach for point-of-regard (POR) tracking on a viewing screen. To allow for greater head pose freedom, we developed a new calibration approach to find the 3-D eyeball location, eyeball radius, and fovea position. Moreover, in order to get the optical axis, we create a 3-D iris disk by mapping both the iris center and iris contour points to the eyeball sphere. We then rotate the fovea accordingly and compute the final, visual axis gaze direction. This part of the system permits natural, non-intrusive, pose-invariant POR estimation from a dis- tance without resorting to infrared or complex hardware setups. We also propose and integrate a two-camera hand pointing estimation algorithm for hand gesture tracking in 3-D from a distance. The algorithms of gaze pointing and hand finger pointing are evaluated individually, and the feasibility of the entire system is validated through two interactive information visualization applications. Index Terms—Gaze estimation, hand tracking, human–com- puter interaction (HCI). I. INTRODUCTION T HE ideal human-computer interaction (HCI) system should function robustly with as few constraints as those found in human-to-human interaction; moreover, it should map human gestures to application control in the most natural and intuitive way possible. Knowledge of eye gaze and point-of-re- Manuscript received November 01, 2010; accepted January 24, 2011. Date of publication February 28, 2011; date of current version May 18, 2011. This work was supported in part by the Air Force Research Laboratory (FA8750-08-1- 0096), in part by the National Science Foundation (IIS-1051103), and in part by the New York State Science and Technology Office (C050040). The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Zhengyou Zhang. M. J. Reale, S. Canavan, L. Yin, and K. Hu are with the Department of Com- puter Science, State University of New York at Binghamton, Binghamton, NY 13902 USA (e-mail: lijun@cs.binghamton.edu). T. Hung is with Corning Corp., Taichung City 40763, Taiwan. This paper has supplementary downloadable material available at http://iee- explore.ieee.org. Part 1 of the video demo is an evaluation of both the eye gaze tracking and hand pointing tracking. The total size is 11.08 MB. Part 2 of the video demo shows two applications of our multi-gesture interaction system in action. The total size is 19.97 MB. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2011.2120600 gard offers insight into the user’s intention and mental focus, and consequently this information is vital for the next genera- tion of user interfaces [1], [2]. Applications in this vein can be found in the fields of HCI, security, advertising, psychology, education, and many others [1], [3]. While eye gaze gives us a more precise yet perhaps more transient idea of the user’s focus, head pose gives a coarser but more committed approximation of the user’s region of interest. As such, head pose has been leveraged directly for coarse gaze estimation [2], video game control [4] and navigation [5], and screen magnification [6]. Finally, hand pointing, one of the most common hand gestures, shows us the region that the user wishes another entity to focus on, irrespective of whether the user is actually looking at the point himself/herself. In an HCI context, this naturally maps to command control [7], and a few sample applications in this line include navigation in a 3-D world [8] and TV control [9]. While many approaches use individual gestures for specific tasks, it is uncommon to use them simultaneously. In this paper, we propose a new algorithm for eye gaze estimation from a distance using a 3-D eyeball/iris disk model, and we also present a new hand pointing estimation approach in 3-D space. We use the combination of four different gesture-based inputs (eye gaze, head pose/position, hand pointing direction, and mouth opening/closing) to develop a multi-gesture interac- tion system. We demonstrate the feasibility and utility of our system through two application case studies. One application is the 3-D Orb File Navigator, and the other is a geographic information visualization program (informally known as the “Midgard viewer”). The overall system has the potential to be extended into many different HCI applications in the educa- tional, entertainment, business, and military sectors. We would also argue that our system allows gesture input on a more detailed level than that which the current wave of commercial gesture control systems, such as the Microsoft Xbox Kinect [10], is able to provide. Specifically, we track eye gaze and mouth movement, and our hand tracking system incorporates a more precise model of the hand. Moreover, we use only regular, visible-spectrum cameras. In contrast, systems like the Kinect are robust and reliable but at present only provide very coarse gesture control (e.g., whole body movement); they also require special hardware (i.e., depth cameras). It is our belief that a combination of coarse and fine grained control will be desirable in the next generation of gesture input device systems. This paper is organized as follows: in Section II, we will re- view the related work, and then we will provide an overview 1520-9210/$26.00 © 2011 IEEE