This article will introduce a method to accurately generate multi-view street view images through BEV Sketch layout
In the field of autonomous driving, image synthesis is widely used to improve downstream perception Task performance
In the field of computer vision, a long-standing research problem in improving the performance of perceptual models is to achieve it by synthesizing images. In vision-centric autonomous driving systems, using multi-view cameras, this problem becomes more prominent because some long-tail scenes can never be collected.
According to As shown in Figure 1(a), the existing generation method inputs the semantic segmentation-style BEV structure into the generation network and outputs reasonable multi-view images. When evaluated solely on scene-level metrics, existing methods appear to be capable of synthesizing photorealistic street view images. Once zoomed in, however, we found that it failed to produce accurate object-level detail. In the figure, we demonstrate a common mistake of state-of-the-art generation algorithms, which is that the generated vehicle is completely oriented in the opposite direction compared to the target 3D bounding box. Furthermore, editing semantic segmentation-style BEV structures is a difficult task that requires a lot of manpower. Therefore, we propose a two-stage method called BEVControl for providing more refined background and foreground geometries. control, as shown in Figure 1(b). BEVControl supports sketch-style BEV structure input, allowing for quick and easy editing. Additionally, our BEVControl decomposes visual consistency into two sub-goals: geometric consistency between street views and bird's-eye views via the Controller; appearance consistency between street views via the Coordinator
##Paper link:
//m.sbmmt.com/link/1531beb762df4029513ebf9295e0d34fBEVControl is a generated network of UNet structure, consisting of a series of modules. Each module has two elements, namely Controller and Coordinator.
Camera projection process of BEV sketch to camera condition. Input is a BEV sketch. The output is multi-view foreground conditions and background conditions.
Controller: Receives the foreground and background information of the camera view sketch in a self-attentional manner, and outputs geometric consistency with the BEV sketch Streetscape features.
Quantitative results
The above is the detailed content of More granular background and foreground control, faster editing: BEVControl's two-stage approach. For more information, please follow other related articles on the PHP Chinese website!