Technical Report #MSU-050531, Department of Computer Science and Engineering, Mississippi State University, 2005. — 1 — Abstract A fundamental problem in optical, see-through augmented reality (AR) is characterizing how it affects human depth perception. This problem is important, because AR system developers need to both place graphics in arbitrary spatial relationships with real- world objects, and to know that users will perceive them in the same relationships. However, achieving this is difficult, because the graphics are physically drawn directly in front of the eyes. Furthermore, AR makes possible enhanced perceptual techniques that have no real-world equivalent, such as x-ray vision, where AR users perceive that graphics are located behind opaque sur- faces. Also, to date AR depth perception research has examined near-field distances, yet many compelling AR applications oper- ate at longer distances, and human depth perception itself operates differently at medium-field and far-field distances. This paper describes the first medium- and far-field AR depth perception experiment that provides metric results. We describe a task and experimental design that measures AR depth perception, with strong linear perspective depth cues, and matches results found in the general depth perception literature. Our experiment quantifies how depth estimation error grows with increasing dis- tance across a range of medium- to far-field distances, and we also find evidence for a switch in bias from underestimating to overestimating depth at ~19.4 meters. Our experiment also exam- ined the x-ray vision condition, and found initial evidence of how depth estimation error grows for occluded versus non-occluded graphics. Keywords: Augmented Reality Depth Perception, Optical See- Through Augmented Reality CR Categories: H.5 [Information Interfaces and Presentation]: H.5.1: Multimedia Information Systems — Artificial, Augmented, and Virtual Realities; H.5.2: User Interfaces — Ergonomics, Evaluation / Methodology, Screen Design 1 Introduction Optical, see-through augmented reality (AR) is the variant of AR where graphics are superimposed on a user’s view of the real world with optical, as opposed to video, combiners. Because optical, see-through AR (simply referred to as “AR” for the rest of this paper) provides direct, heads-up access to information that is correlated with a user’s view of the real world, it has the potential to revolutionize the way many tasks are performed. In addition, AR makes possible enhanced perceptual techniques that have no real-world equivalent. One such technique is x-ray vision, where * swan@acm.org † mark.livingston@nrl.navy.mil AR users perceive objects which are located behind opaque sur- faces. The AR community is applying AR technology to a number of unique and useful applications [Azuma et al. 2001]. The applica- tion that motivated the work described here is mobile, outdoor AR for situational awareness in urban settings [Livingston et al. 2002]. This is a very difficult application domain for AR; the biggest challenges are outdoor tracking and registration, outdoor display hardware, and developing appropriate AR display and interaction techniques. In this paper we are focused on AR display techniques, in par- ticular how to correctly display and accurately convey depth. This is a hard problem for several reasons. Unlike virtual reality, with AR users see the real world, and therefore graphics need to appear to be at the same depth as co-located real-world objects, even though the graphics are physically drawn directly in front of the eyes. Yet current AR displays are compromised in their abil- ity to display depth (for example, they often dictate a fixed focal depth), and it is not yet known if this is simply due to engineering limitations, or if the limits are more fundamental. Furthermore, there is no real-world equivalent to x-ray vision, and how the human visual system processes x-ray visual information is not yet understood, much less the depth accuracy limitations for applica- tions such as the ones mentioned above. Human depth perception delivers a vivid three-dimensional perceptual world from flat, two-dimensional, ambiguous retinal images of the scene. Current thinking on how the human visual system is able to achieve this performance emphasizes the use of multiple depth cues, available in the scene, that are able to resolve and disambiguate depth relationships into reliable, stable percepts. Cue theory describes how and in which circumstances multiple depth cues interact and combine [Landy et al. 1995]. Generally, ten depth cues are recognized [Howard and Rogers 2002]: (1) binocular disparity, (2) binocular convergence, (3) accommoda- tive focus, (4) atmospheric haze, (5) motion parallax, (6) linear perspective and foreshortening, (7) occlusion, (8) height in the visual field, (9) shading, and (10) texture gradient. Real-world scenes combine some or all of these cues, with the structure of the scene determining the salience of each cue. Although depth cue interaction models exist, these were largely developed to account for how stable percepts could arise from a variety of cues with differing salience. The central challenge in understanding human depth perception in AR is how stable percepts can arise from in- consistent, sparse, or conflicting depth cues, which arise either from imperfect AR displays, or from novel AR perceptual situa- tions such as x-ray vision. Therefore, AR depth perception will likely inform both AR technology, as well as depth cue interaction models. A Methodology for Quantifying Medium- and Far-Field Depth Perception in Optical, See-Through Augmented Reality J. Edward Swan II 2* Mark A. Livingston 1† Harvey S. Smallman 3 Joseph L. Gabbard 4 Dennis Brown 1 Yohan Baillot 1 Simon J. Julier 1 Greg S. Schmidt 1 Catherine Zanbaka 5 Deborah Hix 4 Lawrence Rosenblum 1 1 Naval Research Laboratory 2 Mississippi State University 3 Pacific Science & Engineering Group 4 Virginia Tech 5 University of North Carolina at Charlotte