Cross-view Semantic Segmentation for Sensing Surroundings

Bowen Pan, Jiankai Sun, Alex Andonian, Bolei Zhou

Sensing surroundings is ubiquitous and effortless to humans: It takes a single glance to extract the spatial configuration of objects as well as the free space from the observation. To facilitate machine perception with such a surrounding sensing capability, we introduce a novel framework for cross-view semantic segmentation. In this framework, the View Parsing Network (VPN) is proposed to parse the first-view observations into a top-down-view semantic map indicating the spatial location of all the objects at pixel-level. The view transformer module contained in VPN is designed to aggregate the surrounding information collected from first-view observations in multiple angles and modalities. To mitigate the issue of lacking real-world annotations, we train the VPN in simulation environment and utilize the off-the-shelf domain adaptation technique to transfer it to real-world data. We evaluate our VPN on both synthetic and real-world data. The experimental results show that our model can effectively make use of the information from different views and multi-modalities. Thus the proposed VPN is able to accurately predict the top-down-view semantic mask of the visible objects as well as barely seen objects, in both synthetic and real-world environments.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment