Exactly Sparse Delayed-State Filters Ryan M. Eustice * and Hanumant Singh † * MIT/WHOI Joint Program in Applied Ocean Science and Engineering † Department of Applied Ocean Physics and Engineering Woods Hole Oceanographic Institution Woods Hole, MA, USA {ryan,hanu}@whoi.edu John J. Leonard Department of Ocean Engineering Massachusetts Institute of Technology Cambridge, MA, USA jleonard@mit.edu Abstract— This paper presents the novel insight that the SLAM information matrix is exactly sparse in a delayed- state framework. Such a framework is used in view-based representations of the environment which rely upon scan- matching raw sensor data. Scan-matching raw data results in virtual observations of robot motion with respect to a place its previously been. The exact sparseness of the delayed-state information matrix is in contrast to other recent feature- based SLAM information algorithms like Sparse Extended Information Filters or Thin Junction Tree Filters. These methods have to make approximations in order to force the feature-based SLAM information matrix to be sparse. The beneﬁt of the exact sparseness of the delayed-state framework is that it allows one to take advantage of the information space parameterization without having to make any approximations. Therefore, it can produce equivalent results to the “full-covariance” solution. Index Terms— Delayed states, EIF, SLAM. I. I NTRODUCTION Good navigation is often a prerequisite for many of the tasks assigned to mobile robots. For example, much of the deep-water science over the past two decades has relied upon Remotely Operated Vehicles (ROVs) to be the hands and eyes of humans. Scientists who use these vehicles often want to be able to co-locate the data they collect spatially and temporally such as in: studies of bio-diversity [1], coral reef health [2], and archeology [3]. Since GPS signals do not penetrate the ocean surface engineers mainly rely upon acoustic-beacon networks to obtain bounded-error triangulated vehicle positions for large-area navigation [4]. The disadvantage of this method is that it requires the deployment, calibration, and recovery of the transponder net. This is often an acceptable trade-off for long-term deployments, but frequently it is the bane of short-term surveys. In more recent years underwater vehicles have seen advances in their dead-reckoning abilities. The advent of sensors such as the acoustic Doppler velocity log [5] and north seeking ring laser gyro [6] have improved the ability of near-seaﬂoor vehicles to navigate with reported error bounds of less than 1% of distance traveled [7]. However, in the absence of an external reference, error continues to grow unbounded as a monotonic function of time. Eustice, Pizarro and Singh [8] presented a simultaneous localization and mapping (SLAM) technique for near sea- ﬂoor navigation called Visually Augmented Navigation (VAN). This technique incorporates pairwise camera con- straints from low-overlap imagery to constrain the vehicle position estimate and “reset” the accumulated navigation drift error. In this framework, the camera provides mea- surements of the 6 DOF relative coordinate transforma- tion between poses modulo scale. The method recursively incorporates these relative pose constraints by estimating the global poses which are consistent with the camera measurements and navigation prior. These global poses cor- respond to samples from the robot’s trajectory acquired at image acquisition and, therefore, unlike the typical feature- based SLAM estimation problem which keeps track of the current robot pose and an associated landmark map, the VAN state vector consists entirely of delayed vehicle states corresponding to the vehicle poses at the times the images were captured. This delayed-state approach corresponds to a view-based representation of the environment which can be traced back to a batch scan-matching method by Lu and Milios [9] using laser data and hybrid batch/recursive formulations by Fleischer [10] and McLauchlan [11] using camera images. The VAN technique proposed the use of an Extended Kalman Filter (EKF) as the fusion framework for merging the navigation and camera sensor measurements. This is a well known approach whose application to SLAM was developed by Smith, Self, and Cheeseman [12] and Moutar- lier and Chatila [13]. The EKF maintains the joint correla- tions over all elements in the state vector and therefore can “optimally” update estimates of all the elements involved in key events like loop closure. However, maintaining these joint correlations also represent a major computational burden since each measurement update requires quadratic complexity in the size of the state vector. This limits the online use of an EKF to relatively small maps — for the VAN approach this translates into an upper bound of approximately one hundred 6-element poses. The quadratic computational complexity associated with the EKF has been a long recognized issue in the SLAM community. One popular method for reducing the com- putational burden is to decouple the estimation problem into a series of manageable submaps [14]–[16]. In this approach each map is limited to a ﬁxed number of elements and therefore the computational burden of updating each map has an upper bound. However, the trade-off these techniques make is a reduced rate of convergence since