ORBBuf: A Robust Buffering Method for Collaborative Visual SLAM Yu-Ping Wang 1 , Zi-Xin Zou 1 , Cong Wang 1 , Yue-Jiang Dong 1 , Lei Qiao 2 , Dinesh Manocha 3 Abstract— Collaborative simultaneous localization and map- ping (SLAM) approaches provide a solution for autonomous robots based on embedded devices. On the other hand, visual SLAM systems rely on correlations between visual frames. As a result, the loss of visual frames from an unreliable wireless network can easily damage the results of collaborative visual SLAM systems. From our experiment, a loss of less than 1 sec- ond of data can lead to the failure of visual SLAM algorithms. We present a novel buffering method, ORBBuf, to reduce the impact of data loss on collaborative visual SLAM systems. We model the buffering problem into an optimization problem. We use an efﬁcient greedy-like algorithm, and our buffering method drops the frame that results in the least loss to the quality of the SLAM results. We implement our ORBBuf method on ROS, a widely used middleware framework. Through an extensive evaluation on real-world scenarios and tens of gigabytes of datasets, we demonstrate that our ORBBuf method can be applied to different algorithms, different sensor data (both monocular images and stereo images), different scenes (both indoor and outdoor), and different network environments (both WiFi networks and 4G networks). Experimental results show that the network interruptions indeed affect the SLAM results, and our ORBBuf method can reduce the RMSE up to 50 times. I. I NTRODUCTION Visual simultaneous localization and mapping (SLAM) is an important research topic in robotics [1], computer vision [2] and multimedia [3]. In the last decade, requests for multiple robots working together have become popu- lar within different scenarios, including architecture model- ing and landscape exploring; therefore, collaborative visual SLAM has become an emerging research topic [4]. On the other hand, the task of self-localization and mapping is com- putationally expensive, especially for embedded devices with both power and memory restrictions. Collaborative visual SLAM systems that outsource this task to a powerful central processing node are a feasible solution in this scenario [5]. In collaborative visual SLAM systems, robots transmit the collected visual data to other robots and/or a high- performance server. This requires high network bandwidth and network reliability [6]. There has been work on reducing the bandwidth requirements based on video compression techniques [7], [8] or compact 3D representations [9], [10], though few focus on network reliability. In this paper, we address the problem of network reliability, which can have considerable impact on the accuracy of collaborative visual 1 Yu-Ping Wang, Zi-Xin Zou, Cong Wang and Yue-Jiang Dong are with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. 2 Lei Qiao is with the Beijing Institute of Control Engineering, Beijing, China. 3 Dinesh Manocha is with the Department of Computer Science, Univer- sity of Maryland, MD 20742, USA. Yu-Ping Wang is the corresponding author, e-mail: wyp@tsinghua.edu.cn. Fig. 1. The result of a real-world SLAM experiment run with a TurtleBot3 and a server. (a) A photo from our laboratory. (b) The server received visual data from the robot with ROS and ran a SLAM algorithm but failed due to WiFi unreliability (the door on the left was completely missing). (c) It also failed when employing the random buffering method. (d) By employing our ORBBuf method, the SLAM algorithm successfully estimated the correct trajectory (the red curve) and built a sparse 3D map (the white points). SLAM systems, and our approach is orthogonal to the methods that reduce the bandwidth requirements. Network connections, especially wireless ones (e.g. over WiFi or 4G), are not always reliable. A detailed measurement study [11] has shown that lower throughput or even network interruption could occur for dozens of seconds due to tunnels, large buildings, or poor coverage in general. With the advent of 5G, the problems of network bandwidth and latency will be relieved, but the unreliability due to poor coverage still exists [12]. To facilitate our work, we built a collaborative visual SLAM system based on Turtlebot3 and ROS [13]. The robot moved around our laboratory as shown in Figure 1(a). We ﬁxed a camera on top of the robot, and the captured images were transmitted to a server via a public WiFi router. We found that the SLAM algorithm [14] on the server failed at some position. At that position, the network connection was highly unreliable (it may have been affected by the surrounding metal tables). To tolerate such network unreliability, a common solution is buffering, which puts new frames into a buffer and waits for future transmission. When the network is unreliable, the buffer becomes full and the buffering method is responsible for deciding which frame(s) should be discarded. We tried two kinds of commonly used buffering methods, but the SLAM algorithm failed in both cases (shown in Figure 1(b) and (c)). Main Results: In this paper, we present a novel robust arXiv:2010.14861v1 [cs.RO] 28 Oct 2020