Automatic Calibration of Commercial Optical See-Through Head-Mounted Displays for Medical Applications Xue Hu * Mechatronics in Medicine Laboratory, Imperial College London. Fabrizio Cutolo Department of Information Engineering, University of Pisa. Fabio Tatti Mechatronics in Medicine Laboratory, Imperial College London. Ferdinando Rodriguez y Baena § Mechatronics in Medicine Laboratory, Imperial College London. ABSTRACT The simplified, manual calibration of commercial Optical See- Through Head-Mounted Displays (OST-HMDs) is neither accurate nor convenient for medical applications. An interaction-free calibra- tion method that can be easily implemented in commercial headsets is thus desired. State-of-the-art automatic calibrations simplify the eye-screen system as a pinhole camera and tedious offline calibra- tions are required. Furthermore, they have never been tested on original commercial products. We present a gaze-based automatic calibration method that can be easily implemented in commercial headsets without knowing hardware details. The location of the vir- tual target is revised in world coordinate according to the real-time tracked eye gaze. The algorithm has been tested with the Microsoft HoloLens. Current quantitative and qualitative user studies show that the automatically calibrated display is statistically compara- ble with the manually calibrated display under both monocular and binocular rendering mode. Since it is cumbersome to ask users to perform manual calibrations every time the HMD is re-positioned, our method provides a comparably accurate but more convenient and practical solution to the HMD calibration. Index Terms: Human-centered computing—Human computer in- teraction (HCI)—Interaction paradigms— Mixed/augmented reality; Computing methodologies—Computer graphics—Graphics systems and interfaces—Mixed/augmented reality 1 I NTRODUCTION Augmented reality (AR) is quickly becoming a powerful tool for Image-Guidance Surgery (IGS). Within this context, a virtual model created from medical scans (e.g., Computer Tomography or Mag- netic Resonance Imaging) is superimposed on the surgical site. Sur- geons can thus see the patient-specific anatomical model and follow a preoperative plan with better accuracy, reduced invasiveness and a simultaneous view of the real scene. The Optical See-Through head-mounted displays (OST-HMDs) are preferred for IGS as they offer better immersion, safeness and egocentrism. For reliable AR assistance, display calibration, which aligns virtual contents with the perceived reality, is of the utmost importance [3]. The advance in optics design and computational power brings many affordable and highly-compact commercial headsets into the market. Their display calibrations mainly rely on manual virtual- to-real alignment by users which are either tedious or inaccurate. * e-mail: xue.hu17@imperial.ac.uk e-mail: fabrizio.cutolo@endocas.unipi.it e-mail: f.tatti@imperial.ac.uk § e-mail: f.rodriguez@imperial.ac.uk Furthermore, calibrations are often simplified to improve usability, resulting in suboptimal spatial alignment. While this is tolerable for “gaming” or non specialised experience, calibration must be improved for surgical applications regarding both accuracy and con- venience. Automatic calibration is desired in practice. However, state-of-the-art automatic calibration algorithms often require te- dious offline calibrations and low-level rendering control, making them hard to be implemented in commercial products [1]. In this paper, we propose an automatic calibration method that can be easily implemented in most commercial OST-HMDs. The mod- ification does not require the access to hardware details of HMDs, and can be done using a universally supported game engine, Unity 3D. Also to the best of our knowledge, this is the first study that demonstrates automatic calibration without the pinhole camera as- sumption, taking us one step closer to effective automatic calibration of OST-HMDs. 2 METHOD As shown in Fig.1, if a virtual object is placed at the exact tracked location of a real object t, the rendered pixel c will not align with t in user’s eye, because of the parallax between the nodal point e and the rendering camera o. Instead of modifying the display within the screen space (i.e. controlling pixel locations in 2D, which may require overriding intrinsic projection matrices), we move the rendered object’s location to t , so that the rendered 2D pixel on virtual display c can lie on the tracked user’s gaze et . The modified location can be calculated by t = oc | oc| ×| to | + o. 3 I MPLEMENTATION The Microsoft HoloLens (1st generation) was used for method im- plementation and performace test. The embedded calibration app requires users to to manually align a finger with six markers dis- played on each screen. These alignments are collected to calculate user’s interpupillary distance (IPD) which is later utilised for per- sonal parallax correction. Two 640 × 480 resolution, 120 fps Pupil Labs eye cameras were rigidly mounted on the HoloLens to track eye location. Unity 3D, a cross-platform game development engine, was used to simplify the AR development. Fig. 1 shows the overall system configuration. A virtual world coordinate system W is initialised and locked in the physical environ- ment throughout the application’s lifetime. A printed ArUco marker serves as the object of interest t. The environment is videoed by the HoloLens front-facing camera H. The virtual display is simplified as a 3D plane fixed at a distance d relative to the HMD camera. An error in depth estimation Δd will cause a misalignment of l d Δd +1 . As d Δd and the eye-camera parallax l is usually less than 20 mm in practice, a 10% error in depth estimation leads to a hardly noticeable display offset of 1.8 mm (i.e., around 2 pixels for HoloLens). There-