D-3DLD: DEPTH-AWARE VOXEL SPACE MAPPING FOR MONOCULAR 3D LANE DETECTION WITH UNCERTAINTY Nayeon Kim, Moonsub Byeon, Daehyun Ji, Dokwan Oh Samsung Advanced Institute of Technology ABSTRACT The estimation of 3D lanes from monocular RGB images is a fundamentally ill-posed problem. Previous studies have as- sumed that all lanes are on a ﬂat ground plane. However, we argue that the algorithms based on this assumption have dif- ﬁculty in detecting various lanes in actual driving environ- ments. Contrary to previous approaches, we expand rich con- textual features from an image domain to a 3D space by utiliz- ing depth-aware voxel mapping. In addition, we determine 3D lanes based on voxelized features. We design a new lane rep- resentation combined with uncertainties and predict the con- ﬁdence intervals of 3D lane points using Laplace loss. Exper- imental results show that the proposed method achieves state- of-the-art detection accuracy on three challenging datasets, including two real-world datasets, and signiﬁcantly outper- forms existing methods with reasonable computation load. Index Terms— 3D lane detection, Monocular camera, Voxel space mapping, Uncertainty 1. INTRODUCTION In the ﬁeld of autonomous driving, 3D perception technolo- gies are required to identify the location of surrounding ob- jects and model the curvature of a road surface. Among them, 3D lane detection is an essential perception component for control, localization, and planning. The main goal of this task is to ﬁnd the exact 3D location of each lane from an image. Unlike other perception tasks, lane detection has mainly been studied in 2D image planes [1,2,3,4,5,6,7,8,9,10]. Additional post-processing based on geometric constraints (e.g., parallel lanes on a ﬂat road) is required to expand a predicted 2D lane to a 3D space. Therefore, prior works have limitations in di- rectly applying lane detection to various actual road scenarios (e.g., exit ramps). 3D lane detection has been investigated to address this problem [11, 12, 13, 14, 15]. Most 3D lane detection methods [11, 12, 13] assume a single ﬂat ground plane. They cannot detect lanes in the road planes that do not follow this strong assumption, such as distant roads with slopes. Gen-LaneNet [12] projects lane features and the ground truth onto a ﬁxed ﬂat plane. 3D- LaneNet [11, 13] estimates the height and pitch of a road, but it is similar to Gen-LaneNet [12] in that it projects lane fea- (a) Cutting-Edge Method (b) D-3DLD (Our Method) Fig. 1: Comparison of our method with state-of-the- art [12]. Existing methods do not detect lanes with varying slopes (ex. distant lane) at the same time, or even if detected, it can be conﬁrmed that there is a signiﬁcant error in road height (z-offset) prediction. Blue lines are ground truth, red lines are prediction results, and cyan lines denote false nega- tives, respectively. tures to an estimated single plane. Recently, Persformer [14] have adopted transformer architecture for 3D lane detection, but it suffers from inherited heavy computation burden. To address these issues, we propose a simple but robust archi- tectural design for monocular 3D lane detection. We utilize depth-aware voxel space mapping through an attentive ar- chitecture that implictly learns the road depth distribution without supervision. In addition, we generate two real-world datasets for 3D lane detection. Based on existing public datasets [16, 17], only post-processing is carried out for 3D lane detection. Finally, we introduce a method of represent- ing the interpretable conﬁdence interval of uncertainty-aware 3D lane detection by leveraging a loss function based on the Laplace distribution [18]. The contributions of this work are summarized in three points : 1) We present a novel network architecture that con- sists of a lane feature encoder, lane depth network, voxel space mapping, and 3D lane regression. We learn the depth distribution in an end-to-end manner from labeled 3D lane data. 2) We introduce two real-world datasets based on ex- isting public datasets and compare our method with state- of-the-art methods. In two real-world datasets, the proposed method outperforms the existing methods in terms of the F- score by as much as 21.2% and 16.7%, respectively. 3) We combine the uncertainty of predicted 3D lane points with a 3D lane regression network. This improves the AP by 1.4% and enhances the interpretability of network predictions. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-6327-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASSP49357.2023.10096483