Facial Performance Sensing Head-Mounted Display Hao Li † * Laura Trutoiu ‡ * Kyle Olszewski † * Lingyu Wei † * Tristan Trutna ‡ Pei-Lun Hsieh † Aaron Nicholls ‡ Chongyang Ma † † University of Southern California ‡ Oculus & Facebook HMD (CAD model) interior (CAD model) online operation RGB-D camera strain sensors foam liner facial performance capture Figure 1: To enable immersive face-to-face communication in virtual worlds, the facial expressions of a user have to be captured while wearing a virtual reality head-mounted display. Because the face is largely occluded by typical wearable displays, we have designed an HMD that combines ultra-thin strain sensors with a head-mounted RGB-D camera for real-time facial performance capture and animation. Abstract There are currently no solutions for enabling direct face-to-face interaction between virtual reality (VR) users wearing head-mounted displays (HMDs). The main challenge is that the headset obstructs a signiﬁcant portion of a user’s face, preventing effective facial capture with traditional techniques. To advance virtual reality as a next- generation communication platform, we develop a novel HMD that enables 3D facial performance-driven animation in real-time. Our wearable system uses ultra-thin ﬂexible electronic materials that are mounted on the foam liner of the headset to measure surface strain signals corresponding to upper face expressions. These strain signals are combined with a head-mounted RGB-D camera to enhance the tracking in the mouth region and to account for inaccurate HMD placement. To map the input signals to a 3D face model, we perform a single-instance ofﬂine training session for each person. For reusable and accurate online operation, we propose a short calibration step to readjust the Gaussian mixture distribution of the mapping before each use. The resulting animations are visually on par with cutting-edge depth sensor-driven facial performance capture systems and hence, are suitable for social interactions in virtual worlds. CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Virtual reality; Keywords: real-time facial performance capture, virtual reality, depth camera, strain gauge, head-mounted display, wearable sensors ⇤ Authors on the ﬁrst row have contributed equally. 1 Introduction Recent progress towards mass-market head-mounted displays (HMDs) by Oculus [Oculus VR 2014] and others, has led to a revival in virtual reality (VR). VR is drawing wide interest from consumers for gaming and online virtual worlds applications. With the help of existing motion capture and hand tracking technologies, users can navigate and perform actions in fully immersive virtual environments. However, users lack a technological solution for face- to-face communication that conveys compelling facial expressions and emotions in virtual environments. Because a user’s face is signiﬁcantly occluded by the HMD, established methods for facial performance tracking, such as optical sensing technologies, will fail to capture nearly the entire upper face. To address this need, we develop a prototype HMD around an existing device. We augment the system with eight ultra-thin strain gauges (ﬂexible metal foil sensors) placed on the foam liner for surface strain measurements and an RGB-D camera mounted on the HMD cover to capture the geometry of the visible face region. Aside from a slight increase in weight, our design unobtrusively integrates the sensors without further constraining user performance as compared to any standard virtual reality HMD. Complex anatomical characteristics, such as individual facial tissue and muscle articulations, challenge the low dimensionality of our surface measurements across subjects. To map the input signals to a tracked 3D model in real-time, we ﬁrst train a regression model by detaching the cover from the HMD to maximize visibility while the strain gauges are recording. This procedure is only performed once for each individual, and each subsequent use does not require unmounting the cover. Because of slight misplacements as well as the additional weight of the cover and the RGB-D camera, the sensitivity and measured surface locations can differ greatly between the training session and online operation (when the display is attached). For subsequent wearings by the same person, we propose a short calibration step that readjusts the Gaussian mixture distributions of the mapping [Gales 1998]. Like many real-time facial animation systems, our method uses linear blendshape models to produce output animations based on FACS expressions [Ekman and Friesen 1978]. The semantics of each blendshape mesh can be conveniently used for facial performance