Rotation-constrained optical see-through headset calibration with bare-hand alignment Xue Hu * Mechatronics in Medicine Lab Imperial College London Ferdinando Rodriguez y Baena Mechatronics in Medicine Lab Imperial College London Fabrizio Cutolo Department of Information Engineering University of Pisa ABSTRACT The inaccessibility of user-perceived reality remains an open issue in pursuing the accurate calibration of optical see-through (OST) head-mounted displays (HMDs). Manual user alignment is usu- ally required to collect a set of virtual-to-real correspondences, so that a default or an offline display calibration can be updated to account for the user’s eye position(s). Current alignment-based cali- bration procedures usually require point-wise alignments between rendered image point(s) and associated physical landmark(s) of a target calibration tool. As each alignment can only provide one or a few correspondences, repeated alignments are required to ensure calibration quality. This work presents an accurate and tool-less online OST calibra- tion method to update an offline-calibrated eye-display model. The user’s bare hand is markerlessly tracked by a commercial RGBD camera anchored to the OST headset to generate a user-specific cursor for correspondence collection. The required alignment is object-wise, and can provide thousands of unordered corresponding points in tracked space. The collected correspondences are reg- istered by a proposed rotation-constrained iterative closest point (rcICP) method to optimise the viewpoint-related calibration param- eters. We implemented such a method for the Microsoft HoloLens 1. The resiliency of the proposed procedure to noisy data was evalu- ated through simulated tests and real experiments performed with an eye-replacement camera. According to the simulation test, the rcICP registration is robust against possible user-induced rotational misalignment. With a single alignment, our method achieves 8.81 arcmin (1.37 mm) positional error and 1.76 rotational error by camera-based tests in the arm-reach distance, and 10.79 arcmin (7.71 pixels) reprojection error by user tests. Index Terms: H.5.1 [Information interfaces and presentation]: Multimedia Information Systems—; Artificial, Augmented, and Vir- tual realities; H.5.2 [Information interfaces and presentation]: User Interfaces—Ergonomics, Evaluation/methodology, Screen design. 1 I NTRODUCTION In visual augmented reality (AR) experience, defining the appro- priate spatial location of the computer-generated 3D content with respect to the real scene under observation is the principal factor that provides the user with a sense of perceptual congruity (i.e., locational realism) [11]. Optical see-through (OST) head-mounted displays (HMDs) are at the leading edge of the AR research. In OST devices, the computer-generated virtual image is projected onto a semi-transparent optical combiner (OC) placed in front of the user’s eyes, so that the user’s pupil can intercept both the light rays coming from the physical environment and those emitted from the * e-mail: xue.hu17@imperial.ac.uk e-mail: f.rodriguez@imperial.ac.uk e-mail: fabrizio.cutolo@endocas.unipi.it microdisplay [8, 31]. Collimation optics are placed between the mi- crodisplay and the OC so that the virtual 2D image is focused on one or more virtual plane(s) at a comfortable viewing distance [14]. The almost unaltered direct view of the real world allows for a safe and immersive AR experience [30]. However, the inaccessibility of user- perceived retinal images makes OST display calibration particularly challenging [10]. The complexity and unreliability of the calibration procedure required to ensure accurate virtual-to-real alignment is the major obstacle to the widespread adoption of OST HMDs across medical and industrial settings. OST calibration aims to estimate the rendering camera’s projec- tion parameters that ensure an appropriate alignment between the real target scene perceived in the user’s line-of-sight and its virtual homologous rendered on the HMD virtual screen [11]. The eye- display system is usually modelled as an off-axis pinhole camera, the image plane of which corresponds to the see-through virtual screen and the projection centre of which corresponds to the nodal point of the user’s eye [17]. The model contains both hardware-related and human perspective-related contributions. The human perspec- tive can be directly measured by automatic eye-tracking [17, 29] or indirectly estimated from manual user alignment [26, 36]. Of the two options, alignment-based methods are more viable across commercial HMDs due to their weak requirement on dedicated hard- ware to track the user’s eye(s) [25]. Moreover, unlike the automatic methods that track the eyeball centre rather than the actual optical eye centre [11, 29], alignment-based methods can yield authentic viewpoint-related parameters and are thus more accurate when the eye rotates to focus at different distances [24]. In alignment-based calibration procedures, users need to visually align on-screen virtual points with real-world targets by observing the world through the OC. The set of associated 2D-3D point cor- respondences are collected to optimise the unknown parameters required for display update [1, 13]. Nevertheless, alignment-based calibration can be highly time-consuming (i.e., multiple alignments are required to yield accurate results [2]), tedious (i.e., calibration should be repeated any time the HMD moves on the user’s head), and sensitive to the alignment quality performed by the user. Fol- lowing the most popular Single-Point Active Alignment Method (SPAAM) [35], many alignment-based OST calibrations have been developed [3, 21, 32]. However, most of these rely on sparse point- wise correspondences collection, specially-made calibration tools, or at least multiple repeated alignments. In this work, we present an accurate and tool-less online OST calibration method developed upon a homography-corrected off- axis eye-display model [6, 15] to account for the viewpoint-related contribution. A commercial RGBD camera anchored to the headset is exploited to markerlessly track the user’s bare-hand in real-time. The user’s hand is first sampled at an initial position to generate a user-specific contour cursor at the peripersonal location. The cursor is then displayed by the HMD, over which the user needs to align his/her hand. The two dense point clouds, sampled by the RGBD camera at the cursor-generation moment and the alignment moment, can be registered by a proposed rotation constrained-iterative closest point (rcICP) method to optimise the unknown parameters required for the OST display update. The proposed calibration procedure arXiv:2108.10603v1 [cs.GT] 24 Aug 2021