A Portable Immersive Surgery Training System Using RGB-D Sensors Xinqing GUO a , Luis D. LOPEZ a , Zhan YU a , Karl V. STEINER a , Kenneth E. BARNER a , Thomas L. BAUER b and Jingyi YU a a University of Delaware b Christiana Care Health Services Newark, Delaware Abstract. Surgical training plays an important role in assisting residents to develop critical skills. Providing effective surgical training, however, remains as a challeng- ing task. Existing videotaped training instructions can only show imagery from a ﬁxed viewpoint that lacks both depth perception and interactivity. We present a new portable immersive surgical training system that is capable of acquiring and dis- playing high ﬁdelity 3D reconstructions of actual surgical procedures. Our solu- tion utilizes a set of Microsoft Kinect sensors to simultaneously recover the partic- ipants, the surgical environment, and the surgical scene itself. We then develop a space-time navigator to allow the trainees to witness and explore a prior procedure as if they were there. Preliminary feedback from residents shows that our system is much more effective than conventional videotaped system. Keywords. RGB-D Sensor, Microsoft Kinect, Immersive Surgery Training, 3D Reconstruction, Stereoscopic Display Introduction In the U.S., surgeons require longer education and training than other specialists: only after four years of medical school and a minimum of ﬁve years of extensive training will they qualify. The satisfaction of surgical residents with their training program determines its output. The task of providing effective surgical training and re-training, however, is inherently challenging: the number of high quality educators is rather limited and both instructors and trainees are over-constrained by time. The problem is further deteriorating as new surgical procedures are becoming increasingly complex and often require using new devices and protocols. In traditional surgical training, videotaped instruction has long served as a workhorse for teaching surgical procedures. However, they are marginally effective: videotapes only provide 2D imagery that lacks depth perception and the trainee cannot freely change viewpoints as the inputs are captured from a ﬁxed location. To address these issues, the pioneering work of 3D telepresence [1,2,3,4] aims to emulate remote medical proce- dures. At its core are acquisition, reconstruction, and display of the complete 3D geome- try in room-sized surgical environments. Most existing approaches [5,2,6,7,8], e.g., from Fuchs’s group at UNC, Bajcsy’s group at Penn, Kanade’s group at CMU, and Gross’s group at ETH, have pioneered the use of a “sea of cameras” around a room. Their sem-