Automatic Calibration of Commercial Optical See-Through Head-Mounted Displays for Medical Applications Xue Hu * Mechatronics in Medicine Laboratory, Imperial College London. Fabrizio Cutolo † Department of Information Engineering, University of Pisa. Fabio Tatti ‡ Mechatronics in Medicine Laboratory, Imperial College London. Ferdinando Rodriguez y Baena § Mechatronics in Medicine Laboratory, Imperial College London. ABSTRACT The simpliﬁed, manual calibration of commercial Optical See- Through Head-Mounted Displays (OST-HMDs) is neither accurate nor convenient for medical applications. An interaction-free calibra- tion method that can be easily implemented in commercial headsets is thus desired. State-of-the-art automatic calibrations simplify the eye-screen system as a pinhole camera and tedious ofﬂine calibra- tions are required. Furthermore, they have never been tested on original commercial products. We present a gaze-based automatic calibration method that can be easily implemented in commercial headsets without knowing hardware details. The location of the vir- tual target is revised in world coordinate according to the real-time tracked eye gaze. The algorithm has been tested with the Microsoft HoloLens. Current quantitative and qualitative user studies show that the automatically calibrated display is statistically compara- ble with the manually calibrated display under both monocular and binocular rendering mode. Since it is cumbersome to ask users to perform manual calibrations every time the HMD is re-positioned, our method provides a comparably accurate but more convenient and practical solution to the HMD calibration. Index Terms: Human-centered computing—Human computer in- teraction (HCI)—Interaction paradigms— Mixed/augmented reality; Computing methodologies—Computer graphics—Graphics systems and interfaces—Mixed/augmented reality 1 I NTRODUCTION Augmented reality (AR) is quickly becoming a powerful tool for Image-Guidance Surgery (IGS). Within this context, a virtual model created from medical scans (e.g., Computer Tomography or Mag- netic Resonance Imaging) is superimposed on the surgical site. Sur- geons can thus see the patient-speciﬁc anatomical model and follow a preoperative plan with better accuracy, reduced invasiveness and a simultaneous view of the real scene. The Optical See-Through head-mounted displays (OST-HMDs) are preferred for IGS as they offer better immersion, safeness and egocentrism. For reliable AR assistance, display calibration, which aligns virtual contents with the perceived reality, is of the utmost importance [3]. The advance in optics design and computational power brings many affordable and highly-compact commercial headsets into the market. Their display calibrations mainly rely on manual virtual- to-real alignment by users which are either tedious or inaccurate. * e-mail: xue.hu17@imperial.ac.uk † e-mail: fabrizio.cutolo@endocas.unipi.it ‡ e-mail: f.tatti@imperial.ac.uk § e-mail: f.rodriguez@imperial.ac.uk Furthermore, calibrations are often simpliﬁed to improve usability, resulting in suboptimal spatial alignment. While this is tolerable for “gaming” or non specialised experience, calibration must be improved for surgical applications regarding both accuracy and con- venience. Automatic calibration is desired in practice. However, state-of-the-art automatic calibration algorithms often require te- dious ofﬂine calibrations and low-level rendering control, making them hard to be implemented in commercial products [1]. In this paper, we propose an automatic calibration method that can be easily implemented in most commercial OST-HMDs. The mod- iﬁcation does not require the access to hardware details of HMDs, and can be done using a universally supported game engine, Unity 3D. Also to the best of our knowledge, this is the ﬁrst study that demonstrates automatic calibration without the pinhole camera as- sumption, taking us one step closer to effective automatic calibration of OST-HMDs. 2 METHOD As shown in Fig.1, if a virtual object is placed at the exact tracked location of a real object t, the rendered pixel c ′ will not align with t in user’s eye, because of the parallax between the nodal point e and the rendering camera o. Instead of modifying the display within the screen space (i.e. controlling pixel locations in 2D, which may require overriding intrinsic projection matrices), we move the rendered object’s location to t ′ , so that the rendered 2D pixel on virtual display c can lie on the tracked user’s gaze − → et . The modiﬁed location can be calculated by t ′ = − → oc | − → oc| ×| − → to | + o. 3 I MPLEMENTATION The Microsoft HoloLens (1st generation) was used for method im- plementation and performace test. The embedded calibration app requires users to to manually align a ﬁnger with six markers dis- played on each screen. These alignments are collected to calculate user’s interpupillary distance (IPD) which is later utilised for per- sonal parallax correction. Two 640 × 480 resolution, 120 fps Pupil Labs eye cameras were rigidly mounted on the HoloLens to track eye location. Unity 3D, a cross-platform game development engine, was used to simplify the AR development. Fig. 1 shows the overall system conﬁguration. A virtual world coordinate system W is initialised and locked in the physical environ- ment throughout the application’s lifetime. A printed ArUco marker serves as the object of interest t. The environment is videoed by the HoloLens front-facing camera H. The virtual display is simpliﬁed as a 3D plane ﬁxed at a distance d relative to the HMD camera. An error in depth estimation Δd will cause a misalignment of l d Δd +1 . As d ≫ Δd and the eye-camera parallax l is usually less than 20 mm in practice, a 10% error in depth estimation leads to a hardly noticeable display offset of 1.8 mm (i.e., around 2 pixels for HoloLens). There-