Fast Synthetic LiDAR Rendering via Spherical UV Unwrapping of Equirectangular Z-Buﬀer Images Mohammed Hossny a,* , Khaled Saleh b , Mohammed Attia c , Ahmed Abobakr a , Julie Iskander a a Deakin University, Melbourne, Australia b University of Technology Sydney, Australia c Medical Research Institute, Alexandria University, Egypt Abstract LiDAR data is becoming increasingly essential with the rise of autonomous vehicles. Its ability to provide 360 deg horizontal ﬁeld of view of point cloud, equips self-driving vehicles with enhanced situational awareness capabilities. While synthetic LiDAR data generation pipelines provide a good solution to advance the machine learning research on LiDAR , they do suﬀer from a major shortcoming, which is rendering time. Physically accurate LiDAR simulators (e.g. Blensor) are computationally expensive with an average rendering time of 14-60 seconds per frame for urban scenes. This is often compensated for via using 3D models with simpliﬁed polygon topology (low poly assets) as is the case of CARLA (Dosovitskiy et al., 2017). However, this comes at the price of having coarse grained unrealistic LiDAR point clouds. In this paper, we present a novel method to simulate LiDAR point cloud with faster rendering time of 1 sec per frame. The proposed method relies on spherical UV unwrapping of Equirectangular Z-Buﬀer images. We chose Blensor (Gschwandtner et al., 2011) as the baseline method to compare the point clouds generated using the proposed method. The reported error for complex urban landscapes is 4.28cm for a scanning range between 2–120 meters with Velodyne HDL64-E2 parameters. The proposed method reported a total time per frame to 3.2 ± 0.31 seconds per frame. In contrast, the BlenSor baseline method reported 16.2 ± 1.82 seconds. Keywords: LiDAR, Point Cloud, BlenSor, Synthetic Figure 1: A synthetic LiDAR sample of an urban city setup. The proposed method rendered the scene in 4 seconds on Blender 2.79. BlenSor rendering of the same scene took 58 seconds. The average RMSE error between two scenes is 8.5 cm. * Corresponding author ** c 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by- nc-nd/4.0/ 1. Introduction The 2D perception tasks in self-driving cars, such as object detection and semantic segmentation, have been in- creasingly improved over the past few years. One of the main reasons for the improvement, is the recent advance- ments in deep neural network architectures and models, as well as the availability of large amounts of annotated data for such tasks (Cordts et al., 2015; Geiger et al., 2012; Ros et al., 2016). On the other hand, 3D perception tasks are still lagging behind. This is mainly because of the scarcity of available annotated datasets. Recently, a promising approach, for tackling the data problem of the 3D perception tasks, was proposed (Doso- vitskiy et al., 2017; Gschwandtner et al., 2011; Shah et al., 2017). It is the utilisation of photo-realistic simulators and simulated 3D ranges sensors, such as LiDARs, for generat- ing virtually unlimited amounts of labelled 3D point cloud data (Fang et al., 2020; Gaidon et al., 2016; Griﬃths and Boehm, 2019). However, the available simulators are still suﬀering from a number of challenges. For example in CARLA simula- tor (Dosovitskiy et al., 2017), the meshes are simpliﬁed and/or low poly assets of urban traﬃc objects are used, in order to minimise the rendering time for point clouds of the scene. Thus, the resulting point clouds are missing Preprint submitted to Elsevier June 9, 2020 arXiv:2006.04345v1 [cs.CV] 8 Jun 2020