Mosaic of Near Ground UAV Videos Under Parallax Effects Mohamed A. Helala, Luis A. Zarrabeitia, Faisal Z. Qureshi Faculty of Science, University of Ontario Institute of Technology, Oshawa, ON, Canada {Mohamed.Helala, Luis.Zarrabeitia, Faisal.Qureshi}@uoit.ca Abstract—This paper explores the exciting possibility of using Google Earth as a software laboratory for studying wide-area scene analysis using near-ground aerial imagery. To this end we present a new image mosaicing algorithm capable of generating large mosaics from imagery captured by a near-ground aerial vehicle. Our algorithm eschews camera calibration and can handle strong parallax effects visible in the captured imagery. The imagery is generated by simulating an aerial vehicle ﬂying over the New York city within the Google Earth environment. We also evaluate the proposed approach on a real dataset captured by a physical aerial vehicle, demonstrating that the algorithm that was initially developing using synthetic imagery does indeed work on real data. I. I NTRODUCTION The ability to curate and analyze imagery at the global scale has been pivotal in designing visually rich Geographic Information Systems (GISs), such as the Google Earth, that contain 3D models of major metropolitan areas of the world. These models are painstakingly constructed using, among other sources of information, imaging data captured through aerial vehicles. It turns out that visually rich GISs are also a valuable source of data when studying large scale image analysis systems. With this in mind this paper uses synthetic imagery generated using Google Earth and develops a new method for mosaic construction from near ground aerial im- agery. The paper also demonstrates the proposed method on a real dataset captured by a physical UAV ﬂying over a city. Unmanned Aerial Vehicles (UAVs) have gained a lot of attention lately. Their ability to survey and observe large areas, including regions which are not easily traversable, make them ideally suited for applications, such as search and rescue, forest ﬁre monitoring, forest biomass estimation, agricultural information gathering, land-use changes monitoring, etc [1], [2]. Images collected by UAVs need to be registered with each other to get a coherent picture of the area under observation. This is akin to generating an image mosaic given a set of images taken at different viewpoints. For UAVs ﬂying at low altitude over an uneven terrain that exhibit strong variation in depth, say a city block, parallax effects must be accounted for when constructing a mosaic from the captured images. Several techniques have been proposed for constructing image mosaics [3], [4], [5]. Typically these either assume negligible parallax or a highly constrained camera motion model or both. Recently [5], [6] proposed an image mosaicing technique that compensates for weak parallax by assuming a dominant translation camera motion model. These techniques can only handle translational camera motion and small varia- tions in camera-scene distances. Both of these assumptions do not hold in our situation. The work presented in this paper relaxes these assumptions and is able to handle arbitrary camera motions and strong parallax effects. The proposed technique begins by estimating camera motion and calibration parameters from a sequence of images. Next this information is used to project all the images into a common viewpoint, accounting for depth and occlusion properties of every pixel in every captured image. Speciﬁcally we employ a plane sweep algorithm to project different images at different depths. An energy minimization step processes the images projected at different depth levels and constructs the ﬁnal mosaic by assigning the best depth value to each pixel given the selected viewpoint. The work presented here makes the following contributions: 1) it develops a different formulation for the plane sweep algorithm that can handle arbitrary models in contrast to [5], [6], which assume a translational motion model; 2) unlike [5], [6], the proposed method can construct mosaics from low framerate video sequences, as long as there is some overlap between frames; and lastly, we propose using Google Earth GIS as a software laboratory for studying aerial image analysis algorithms. The rest of the paper is organized as follows: Section II discusses the related work. The next section describes our methodology. Results are provided in Section IV and we conclude the paper with Section V. II. RELATED WORK Steedly et al. present a taxonomy of video mosaics and their applications [7]. They divide video mosaics into four classes: 1) static mosaics, 2) dynamic mosaics, 3) temporal mosaic pyramids, and 4) multi-resolution mosaics. The work presented in this paper is closest to the ﬁrst category (static mosaics). A static mosaic divides a video sequence into a set of shots. Each shot consists of overlapping images. A mosaic is created for each shot. These mosaics are then aligned towards a reference shot. Static video mosaic construction typically consists of three steps: frame registration, frame integration, and illumination and parallax compensation. For a good introduction to image registration algorithms, we point the kind reader to [8], [9]. Several techniques have been proposed for constructing static video mosaics. [3] divides video frames into key frames