What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!-AI-php.cn

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

#End-to-end differentiable learning for autonomous driving has recently become a prominent paradigm. A major bottleneck is its huge demand for high-quality labeled data, such as 3D boxes and semantic segmentation, which are notoriously expensive to manually annotate. This difficulty is compounded by the salient fact that within-sample behavior in AD often has long-tailed distributions. In other words, most of the data collected may be trivial (e.g., driving forward on a straight road), with only a few situations being safety critical. In this paper, we explore a practically important but underexplored issue, namely how to achieve sample and label efficiency in end-to-end AD.

Specifically, the paper designs a planning-oriented active learning method that gradually annotates parts of the collected raw data based on the diversity and usefulness criteria of the proposed planning routes. Empirically, the proposed plan-oriented approach can outperform general active learning approaches to a large extent. Notably, our method achieves comparable performance to state-of-the-art end-to-end AD methods using only 30% of nuScenes data. Hopefully our work will inspire future work from a data-centric perspective, in addition to methodological efforts.

Paper link: https://arxiv.org/pdf/2403.02877.pdf

Main contribution of this article:

The first in-depth study of E2E-AD People with data problems. Also provides a simple yet effective solution to identify and annotate valuable data for planning within a limited budget.
Based on the planning-oriented philosophy of the end-to-end approach, new task-specific diversity and uncertainty measures are designed for planning routes.
A large number of experiments and ablation studies have proven the effectiveness of the method. ActiveAD outperforms generic peer-to-peer methods by a large margin and achieves comparable performance to SOTA methods with full labels using only 30% of nuScenes data.

Method introduction

ActiveAD is described in detail in the end-to-end AD framework, and diversity and uncertainty indicators are designed based on the data characteristics of AD .

1) Initial sample selection for labels

For active learning in computer vision, initial sample selection is usually based only on the original image without additional information or learning characteristics, which has led to the common practice of random initialization. In the case of AD, there is additional prior information available. Specifically, when collecting data from sensors, traditional information such as the speed and trajectory of the self-vehicle can be recorded simultaneously. Additionally, weather and lighting conditions are often continuous and easy to annotate at the fragment level. This information facilitates making informed choices for initial set selection. Therefore, we designed a self-diversity measure for initial selection.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Ego Diversity: Consists of three parts: 1) Weather lighting 2) Driving instructions 3) Average speed. First, use the description in nuScenes to divide the complete data set into four mutually exclusive subsets: Day Sunny (DS), Day Rainy (DR), Night Sunny (NS), NightRainy (NR). Secondly, each subset is divided into four categories based on the number of left, right and straight driving commands in a complete segment: left turn (L), right turn (R), overtaking (O), and go straight (S). The paper designs a threshold τc, where if the number of left and right commands in a clip is greater than or equal to the threshold τc, we regard it as a transcendent behavior in the clip. If only the number of left commands is greater than the threshold τc, it indicates a left turn. If only the number of rightward commands is greater than the threshold τc, it indicates a right turn. All other cases are considered direct. Third, calculate the average speed in each scene and sort them in ascending order within the relevant subset.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Figure 2 gives the detailed intuitive process of the initial selection process based on multi-way trees.

2) Criterion design for incremental selection

In this section we will introduce how to incrementally annotate new parts of a fragment based on a model trained with annotated fragments . We will use the intermediate model to perform inference on unlabeled segments, and subsequent selections are based on these outputs. Nonetheless, a planning-oriented perspective is adopted and three criteria for subsequent data selection are introduced: displacement errors, soft collisions, and proxy uncertainties.

Standard 1: Displacement error (DE). will be expressed as the distance between the model’s predicted planned route τ and the human trajectories τ* recorded in the dataset.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

where T represents the frame in the scene. Since the displacement error is itself a performance metric (no annotation required), it naturally becomes the first and most critical criterion in active selection.

Standard 2: Soft collision (SC). LSC is defined as the distance between the predicted self-vehicle trajectory and the predicted agent trajectory. Low confidence agent predictions will be filtered out by the threshold ε. In each scenario, the shortest distance is chosen as the measure of hazard coefficient. At the same time, maintain a positive correlation between term and nearest distance:

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Use "soft collision" as a criterion because: on the one hand, unlike "displacement error", " The calculation of "collision ratio" depends on annotations of the target's 3D box, which are not available in unlabeled data. Therefore, it should be possible to calculate the criterion based solely on the model's inference results. On the other hand, consider a hard collision criterion: if the predicted self-vehicle trajectory will collide with the trajectories of other predicted agents, assign it 1, otherwise assign it 0. However, this may result in too few samples with label 1, since the collision rate of state-of-the-art models in AD is usually small (less than 1%). Therefore, it was chosen to use the closest distance to other pairs of targets instead of the "collision rate" metric. The risk is considered much higher when the distance to other vehicles or pedestrians is too close. In short, "soft collisions" are an effective measure of collision likelihood and can provide intensive oversight.

Standard III: agent uncertainty (AU). Predictions of the future trajectories of surrounding agents are naturally uncertain, so motion prediction modules typically generate multiple modalities and corresponding confidence scores. Our goal is to select data for which nearby agents have high uncertainty. Specifically, distant subjects are filtered out by a distance threshold δ, and the weighted entropy of the predicted probabilities of multiple modes for the remaining subjects is calculated. Assume that the number of modalities is and the agent’s confidence score in different modalities is Pi(a), where i∈{1,…,Nm}. Then, Agent uncertainty can be defined as:

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

##Overall Loss:

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

3) Overall initiative Learning Paradigm

Alg1 introduces the entire workflow of the method. Given an available budget B, an initial selection size n0, the number of activity selections made at each step ni, and a total of M selection stages. Selection is first initialized using the randomization or self-diversity methods described above. Then, the currently annotated data is used to train the network. Based on the trained network, we make predictions on the unlabeled ones and calculate the total loss. Finally, the samples are sorted according to the overall loss and the top ni samples to be annotated in the current iteration are selected. This process is repeated until the iteration reaches the upper limit M and the number of selected samples reaches the upper limit B.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Experimental results

Experiments were conducted on the widely used nuScenes dataset. All experiments are implemented using PyTorch and run on RTX 3090 and A100 GPUs.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Table 1: Planning performance. ActiveAD outperforms general active learning baselines in all annotation budget settings. Furthermore, ActiveAD with 30% of the data achieved slightly better planning performance compared to training using the entire dataset. VADs with * indicate updated results that are better than those reported in the original work. UniAD with † indicates that VAD's indicators have been used to update the results.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Table 2: Designed ablation experiment. “RA” and “ED” represent initial set selection based on randomness and self-diversity. “DE”, “SC” and “AU” represent displacement errors, which are soft collision and agent uncertainty respectively. All combinations with "ED" are initialized with the same 10% data. LDE, LSC and LAU are normalized to [0, 1] respectively, and the hyperparameters α and β are set to 1.

Figure 3: Visualization of selected scenes. Displacement error (col 1), soft collision (col 2), agent uncertainty (col 3) and hybrid (col 4) criteria based on selected front camera images based on a model trained on 10% of the data. Mixed represents our final choice strategy, ActiveAD, and takes the first three scenarios into consideration!

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Table 4, performance in various scenarios. The smaller the average L2(m)/average collision rate (%) of the active model using 30% of the data, the better the performance under various weather/lighting and driving command conditions.

What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!

Figure 4: Similarity between multiple criteria. It shows the new sampling scenario with 10% (left) and 20% (right) selected by four criteria: Displacement Error (DE), Soft Collision (SC), Agent Uncertainty (AU) and Mixing (MX)

Some conclusions of this work

In order to solve the high cost and long-tail problems of end-to-end autonomous driving data annotation, we took the lead in developing a tailor-made active learning solution, ActiveAD. ActiveAD introduces new task-specific diversity and uncertainty measures based on a planning-oriented philosophy. A large number of experiments prove the effectiveness of the method. Using only 30% of the data, it significantly exceeds the general previous methods and achieves performance comparable to the state-of-the-art models. This represents a meaningful exploration of end-to-end autonomous driving from a data-centric perspective, and we hope that our work will inspire future research and discovery.

The above is the detailed content of What to do if there is no data end-to-end? ActiveAD: End-to-end active learning for autonomous driving for planning!. For more information, please follow other related articles on the PHP Chinese website!