Towards Depth Perception from Noisy Camera based Sensors for Autonomous Driving: Survey Mena NAGIUB 1 , Thorsten BEUTH 1 1 Valeo Schalter und Sensoren GmbH, Bietigheim-Bissingen, Germany {mena.nagiub, thorsten.beuth}@valeo.com Keywords: Monocular depth prediction, dense depth completion, noise, camera-based sensors, autonomous driving, LIDAR, sparse, TOF, infrared, safety, ambiguity, uncertainty, SOTIF. Abstract: Autonomous driving systems use depth sensors to create 3D point clouds of the scene. They use 3D point clouds as a building block for other driving algorithms. Uncertainty and noise in depth sensors’ measurements prevent them from giving reliable data, compromising the overall system safety. Depth completion and predic- tion methods are used to complete the depth information and remove inaccuracy. Accuracy is a cornerstone of automotive safety. In this paper, we study the different depth completion and prediction methods and provide an overview of the accuracy of those methods and suitable use cases. The study is limited to low-speed driving scenarios based on standard cameras and time of flight cameras. 1 INTRODUCTION In autonomous driving systems, the vehicle is a robot that reads several sensors to navigate its dynamic en- vironment. Therefore, the first step in autonomous driving is the environment perception, while build- ing depth maps is an essential step (Dijk and Croon, 2019), (Liu et al., 2017). This study focuses on achievements done by deep learning for depth prediction and completion methods and how they handle the noise of different sensors. The aim is to define guidelines for designing a depth sensor or a fusion of sensors that consider the safety of intended functions. 1.1 Depth Perception Sensors Depth perception is usually done using different vi- sion sensors, including camera, LIght Detection And Ranging (LIDAR), RAdio Detection And Ranging (RADAR), and ultrasonic sensors. This study will fo- cus on cameras, and LIDARs (Liu et al., 2017). Different sensors suffer from different sources of noise and inaccuracies. Cameras suffer from noise problems when used for depth prediction (Saxena et al., 2007), (Ladicky et al., 2014), (Hoiem et al., 2007), (Ku et al., 2018), (Dijk and Croon, 2019), (Sjafrie, 2019), (Watson et al., 2021), (Bartoccioni et al., 2021), since distant objects are represented by less number of pixels, also higher color variance lead to depth prediction errors. In addition cameras are: • Error prune to calibration and alignment. • Sensitive to sharp edges, causing blurriness. • Sensitive to ambient light, especially sunlight. • Sensitive to rough weather, like snow and rain. • Sensitive to colors, textures, and shades. Notice that the depth estimation task may be halted entirely in case of camera failure due to the unavailability of redundancy. Also, the lack of actual distances metric reference of the natural world causes scale ambiguity. Monocular depth estimation methods using struc- ture from motion (optical flow) (Trucco and Verri, 1998) also suffer from additional problems as the absence of relative motion between the consecutive frames will result in accuracy problems for depth es- timation, even to the point of failing to calculate depth completely. Furthermore, optical flow-based monoc- ular depth estimation assumes that the camera is not static, and no objects are moving with similar speed to the camera (i.e., relative zero speed) (Watson et al., 2021). On the other hand, LIDARs are considered the most accurate method for creating 3D maps (Chen et al., 2018) because they are active sensors that de- pend on their source of light. Hence, ambient light and materials colors and texture have little impact on the depth calculations. However, LIDARs suffer from