Data-driven Motion Estimation with Low-Cost Sensors Liguang Xie 1 , Mithilesh Kumar 1 , Yong Cao 1 ,Denis Gracanin 1 , Francis Quek 1 1 Computer Science Department Virginia Polytechnic Institute and State University, United States yongcao@vt.edu Keywords:Performance animation, accelerometer, motion synthesis, data-driven, local modeling Abstract Motion capture can produce high quality data for motion generation. However, that professional motion capture is expensive, and imposes restrictions on the capturing environment. We propose a motion estimation framework that utilizes a small set of low-cost, 3D acceleration sensors. We use a data-driven approach to synthesize realistic human motion comparable in quality to the motion captured by the professional motion capture systems. We employ eight 3D accelerometers — four Nintendo c  Wii controllers — attached to a performer’s body to capture motion data. The collected data is used to synthesize high quality motion from a statistical model learned from a high quality motion capture database. The proposed system is inexpensive and is easy to setup. 1 Introduction Compared to manual animation authoring solutions, automatic motion generation techniques can produce a large amount of high quality motion clips with little effort. Motion capture is one of these techniques and has been widely adopted by the animation industry. However, motion capture systems are costly and require a long setup time. This places signiﬁcant constraints for its use. Recently, the availability of low-cost motion sensors, such as 3D accelerometers and gyroscopes, promises to put ubiquitous motion generation systems within the reach of the animation community. In this paper, we propose a low cost motion estimation and synthesis framework. A prototype system is built on a small number of Nintendo c  Wii TM controllers that are easy to attach to human body. Using Wii controllers as input devices, we are able to generate high quality motion data using our motion estimation framework. The system we developed is easy to set up and imposes little or no restriction on the data acquisition environment. We aim to make our motion synthesis framework as convenient as video capture systems and make it applicable to a wide range of applications, ranging from video game interfaces, animated chat-rooms to interactive character control in virtual environments (VEs). We estimate full body motion in two phases. During the ﬁrst, data collection phase, we collect motion data from a “profesional” performer using a commercially available professional motion capture system. At the same time we also capture 3D acceleration data from eight sensors (four Wii controllers) attached to the performer’s body. This one-to-one mapped, time synchronized data is used to create a large, high quality motion capture database. In the second phase, we capture motion from a ’regular user’ using only the attached accelerometer sensors. We then estimate the corresponding motion using a local linear model created from the motion capture database. The proposed local linear model can estimate high quality motion from low-dimensional noisy accelerometer data. Our local modeling approach also enables us to scale the database to incorporate large amounts of motions without performance degradation. We evaluate our system by comparing the synthesized results with ground truth which is simultaneously captured by an optical motion capture system. The evaluation shows that our system can accurately estimate full body motion using a small number of low-cost acceleration sensors. The remainder of the paper is organized as follows. Section 2 provides the background and describes the related work in this area. Section 3 explains the system architecture while Sections 4 and 5 provide the detailed description of our approach. Section 6 shows the results and demonstrates the accuracy of our approach. Section 7 summarizes the paper and discusses the limitations. Section 8 discusses future work to address the current limitations. 2 Background There is a variety of user interfaces to control the motion of characters in 3D VEs, especially for gaming. The input devices include mouses, keyboards, joysticks and other devices such as vision based tracking systems (e.g., c Sony EyeToy) and inertial/acceleration sensors (e.g., Wii TM Controller). Such user interfaces provide immediate and direct control signals with limited number of degrees of freedom. Therefore, it is difﬁcult to provide performance-driven control for complex human motions. Badler et al. [1] proposed a system that reconstructs full-body motion using four magnetic sensors and a real-time inverse- kinematic algorithm to control a standing character in a VE. The system introduced a data-driven approach to address the kinematic redundancy problem. Another system developed by Yin and Pai [9] synthesizes full-body motion within one second by using a foot pressure sensor. However, it can only generate a small range of behaviors and cannot produce motion for complex upper body movements. Chai et al. [4] implemented a vision based system that requires only two inexpensive video cameras. Using only six markers attached to a body, the system can synthesize