Monocular-Based 3-D Seaﬂoor Reconstruction and Ortho-Mosaicing by Piecewise Planar Representation Tudor Nicosevici * , Shahriar Negahdaripour ** , Rafael Garcia * * Computer Vision and Robotics Group, University of Girona, Girona, Spain Email: {tudor,rafa}@eia.udg.es ** Electrical and Computer Engineering Department, University of Miami, Miami, FL USA Email: shahriar@miami.edu Abstract— Photo-mosaicing techniques have become popular for seaﬂoor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seaﬂoor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach. I. I NTRODUCTION Visual surveys have become an important component of seaﬂoor mapping for scientiﬁc studies; e.g., [1], [2]. Develop- ments in HDTV and very-high resolution digital systems have enabled the imaging of benthic habitats with unprecedented details, thus offering tremendous potential for exploration and new discoveries in various domains of marine sciences, including biology, geology and archeology. Coupled with recent advances in automatic and autonomous navigation, submersible imaging platforms provide mapping capabilities far surpassing those achieved from traditional scientiﬁc diver- based surveys. At the same time, these go hand in hand with tremendous processing requirements and the need for techni- cal/algorithmic developments to generate large-area composite maps that match the resolution of individual frames (or exceed it by the employment of super-resolution techniques [3]). Mapping in the underwater environment is inherently a complex problem. Light attenuation and backscattering dras- tically limit the range and coverage area of optical sensors; at best no more than a few meters in each dimension. For this reason alone, extended effort has to be devoted merely to align partially overlapping frames seamlessly in order to provide a larger coverage one that may otherwise be available in a single frame in the absence of limited visibility. Furthermore, unstructured clutter in most benthic environments demand more complex algorithms to process the image data. For example, underwater mosaicing systems have been developed based on the traditional photogrammetry mapping techniques applied to satellite and aerial imagery, assuming the planarity of the mapped scene; e.g., [4]. This enables the registration of image frames using simple transformations with only a small number of parameters, known as planar homographies; e.g., [5]. Unfortunately, most regions and (or) objects of interest for scientistic studies are hardly planar; hydro-thermal vents, coral reefs, and shipwrecks to name a few. This holds even more true in close-range imaging, targeted for recording the very ﬁne-scale target details. In such cases, the parallax effects induce image deformations that strongly violate the planar homography model. However, there is sufﬁcient information within overlapping regions to estimate the 3-D relief of the mapped area based on multiple-view geometrical constraints [6]. This can then be used to generate a so-called ortho- rectiﬁed mosaic [7], [8], [9]. In recent years, some work have explored the application of stereo imaging for underwater 3-D terrain reconstruction [10], [11]. This involves the use of two cameras (or generally more) in order to obtain local 3-D maps from disparity cues. The incremental local maps, generated as the stereo system moves, may be merged into a global 3-D reconstruction of the surveyed area [12]. Another approach is the application of structure/depth from motion (SFM/DFM) methods based on monocular images [6], [13]. SFM involves the extraction and tracking of a sparse set of features in a sequence, and the estimation of their 3-D positions using multiple views. This can be achieved rather robustly based on a bundle adjustment technique [14]. In theory, a 3-D dense map may then be generated by surface interpolation [15]. However, the 3-D dense reconstruction accuracy is highly dependent on the Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 27,2010 at 10:52:22 UTC from IEEE Xplore. Restrictions apply.