AI
HD maps are fundamental components in autonomous driving systems, providing centimeter-level details of traffic rules, vectorized topology, and navigation information. These maps enable the ego-vehicle to accurately locate itself on the road and anticipate upcoming features HD map constructors treat this task as predicting a collection of vectorized static map elements in bird’s eye view (BEV), e.g., pedestrian crossings, lane dividers, road boundaries, etc.
In real-world driving scenarios, adverse conditions, such as bad weather, motion distortions, and sensor malfunctions (frame loss, sensor crashes, incomplete echoes, etc.) are unavoidable It remains unclear how existing HD map construction methods perform under such challenging yet safety-critical conditions, highlighting the need for a thorough out-of-domain robustness evaluation.
To address this gap, we introduce MapBench, the first comprehensive benchmark aimed at evaluating the reliability of HD map construction methods against natural corruptions that occur in real-world environments. We thoroughly assess the model’s robustness under corruptions by investigating three popular configurations: camera-only, LiDAR-only, and camera-LiDAR fusion models. Our evaluation encompasses 8 types of camera corruptions, 8 types of LiDAR corruptions, and 13 types of camera-LiDAR corruption combinations, as depicted in Fig. 1. We define three severity levels for each corruption type and devise appropriate metrics for quantitative robustness comparisons.
Figure 1. Definitions of the Camera and LiDAR sensor corruptions in MapBench
In this work, we investigate three popular configurations, i.e., Camera-only, LiDAR-only, and Camera-LiDAR fusion-based HD map construction tasks, and study their robustness to various sensor corruptions. As illustrated in Fig. 1, the camera/LiDAR corruptions are grouped into exterior environments, interior sensors, and sensor failure types, covering the majority of real-world cases.
Following the protocol established in [1], we consider three corruption severity levels, i.e., Easy, Moderate, and Hard, for each type of corruption. Additionally, regarding multi-sensor corruptions, we use camera/LiDAR sensor failure types to perturb camera and LiDAR sensor inputs separately or concurrently. MapBench is constructed by corrupting the val set of nuScenes. We chose nuScenes since it has been widely utilized among almost all recent HD map construction works.
Camera Sensor Corruptions. To probe the Camera-only model robustness, we employ 8 real-world camera sensor corruptions from [2], ranging from three perspectives: exterior environments, interior sensors, and sensor failures. Specifically, the exterior environments include various lighting and weather conditions such as Bright, Low-Light, Fog, and Snow. The camera inputs might also be corrupted by interior factors caused by sensors, such as Motion Blur and Color Quantization. Lastly, we consider the sensor failure cases where cameras crash or certain frames are dropped due to physical problems, leading to Camera Crash and Frame Lost, respectively.
LiDAR Sensor Corruptions. To explore the LiDAR-only model robustness, we resort to 8 LiDAR sensor corruptions in [3], which are scenarios that have a high likelihood of occurring in real-world deployments. These corruptions also range from exterior, interior, and sensor failure cases. The exterior environments encompass Fog, Wet Ground, and Snow, which cause back-scattering, attenuation, and reflection of the LiDAR pulses. Besides, the LiDAR inputs might be corrupted by bumpy surfaces, dust, or insects, which often lead to disturbances and cause Motion Blur and Beam Missing. Lastly, we consider the cases of LiDAR internal sensor failures, such as Crosstalk, possible Incomplete Echo, and Cross-Sensor scenarios.
Multi-Sensor Corruptions. To explore the Camera-LiDAR fusion model robustness, we design 13 types of camera-LiDAR corruption combinations that perturb both camera and LiDAR input separately or concurrently, using the aforementioned sensor failure types. These multi-sensor corruptions are grouped into camera-only corruptions, LiDAR-only corruptions, and their combinations, covering the majority of real-world scenarios. Specifically, we design 3 camera-only corruptions by utilizing the “clean” LiDAR point data and three camera failure cases such as Unavailable Camera (all pixel values are set to zero for all RGB images), Camera Crash, and Frame Lost. Moreover, we design 4 LiDAR-only corruptions by utilizing the “clean” camera data and the corrupted LiDAR data as the input. This includes complete LiDAR failure (since no model can work when all points are absent, we approximate this scenario by only retaining a single point as input), Incomplete Echo, Crosstalk, and Cross-Sensor. Note that our implementations of complete LiDAR failure are close to the real-world situation. Lastly, we design 6 camera-LiDAR corruption combinations that perturb both sensor inputs concurrently, using the previously mentioned image/LiDAR sensor failure types.
We define two robustness evaluation metrics based on mAP (mean Average Precision), a commonly-used accuracy indicator for vectorized HD map construction.
Corruption Error (CE). We define CE as the primary metric in comparing models’ robustness. It measures the relative robustness of candidate models compared to a baseline.
Resilience Rate (RR). We define RR as the relative robustness indicator for measuring how much accuracy a model can retain when evaluated on the corruption sets.
Our MapBench encompasses a total of 31 HD map constructors and their variants, i.e., HDMapNet [4], VectorMapNet [5], PivotNet [6], BeMapNet [7], MapTR [8], MapTRv2 [9], StreamMapNet [10] and HIMap [11].
Note that the LiDAR-only models here take temporally aggregated LiDAR points as the input, hence their mAP on “clean” data are much higher than those from other tables or figures, where single-scan LiDAR points are utilized for a fair comparison with the corrupted data.
Table 1. Benchmarking HD map constructors
We show the camera sensor corruption robustness of 8 camera-only HD map models in Fig. 2 (a)-(b). Our findings indicate that existing HD map models exhibit varying degrees of performance declines under corruption scenarios. Overall, the corruption robustness is highly correlated with the original accuracy on the “clean” data, as the models (e.g., StreamMapNet [10], HIMap [11]) with higher accuracy also achieve better corruption robustness. We further show the accuracy comparisons of camera-only methods under different corruption severity levels in Fig. 3.
Figure 2. The correlations of accuracy (mAP) and robustness (mCE / mRR) for the Camera (a) and (b) and LiDAR (c) and (d) models. The size of the circle represents the number of model parameters
Figure 3. The mAP metrics of state-of-the-art HD map constructors under each of the three severity levels (Easy, Moderate, and Hard) in different Camera and LiDAR sensor corruption scenarios
We report the LiDAR sensor corruption robustness of 4 LiDAR-only HD map constructors in Fig. 2 (c)-(d) and Fig. 3. Similar to the observations of camera-only models, LiDAR-only models that have higher accuracy on the “clean” set generally achieve better corruption robustness.
To systematically evaluate the reliability of camera-LiDAR fusion-based methods, we design 13 types of multi-sensor corruptions that perturb camera and LiDAR inputs separately or concurrently. The results are presented in Fig. 4. Our findings indicate that the camera-LiDAR fusion model exhibits varying degrees of performance declines on different corruption combinations.
Figure 4. The results of Camera-LiDAR fusion methods [8, 11] under multi-sensor corruptions
In this work, we conducted the first study of benchmarking and analyzing the reliability of HD map construction methods under sensor corruptions that occur in real-world driving environments. Our results reveal key factors that coped closely with the out-of-domain robustness, highlighting crucial aspects in retaining satisfactory accuracy. We hope our comprehensive benchmarks, in-depth analyses, and insightful findings could help better understand the robustness of HD map construction tasks and offer useful insights into designing more reliable HD map constructors in future studies.
https://arxiv.org/abs/2406.12214
[1] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
[2] Shaoyuan Xie, Lingdong Kong,Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robobev: Towards robust bird’s eye view perception under corruptions. arXiv preprint arXiv:2304.06719, 2023.
[3] Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3d: Towards robust and reliable 3d perception against corruptions. In IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023.
[4] Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In IEEE International Conference on Robotics and Automation, pages 4628–4634, 2022.
[5] Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Hang Zhao. Vectormapnet: End-to-end vectorized hd map learning. In International Conference on Machine Learning, pages 22352 22369. PMLR, 2023.
[6] Wenjie Ding, Limeng Qiao, Xi Qiu, and Chi Zhang. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023.
[7] Limeng Qiao, Wenjie Ding, Xi Qiu, and Chi Zhang. End-to-end vectorized hd-map construction with piecewise bezier curve. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13218–13228, 2023.
[8] Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction. In International Conference on Learning Representations, 2023.
[9] Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized HD map construction. arXiv preprint arXiv:2308.05736, 2023.
[10] Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Streammapnet: Streaming mapping network for vectorized online hd map construction. In IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024.
[11] Yi Zhou, Hui Zhang, Jiaqian Yu, Yifan Yang, Sangil Jung, Seung-In Park, and ByungIn Yoo. Himap: Hybrid representation learning for end-to-end vectorized hd map construction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.