In the image, a man with a black jacket is standing in front of a white poster. The poster features a map with a green outline of an urban area, accompanied by textual information. The man appears to be engaged in a discussion or presentation, as he is pointing to the map while a young boy stands nearby, listening attentively. The setting suggests that the event is taking place outdoors, possibly at a public venue or a festival. The map on the poster likely represents a city or a region, and the man might be explaining the significance of the urban area or its features to the boy. Overall, the image conveys a sense of learning and connection between the man and the young boy as they discuss the white poster with the map.
Text transcribed from the image:
天門
UNIVERSITA
MOIENSIS
intel.
Highlight
Bochun Yang Zijun Li Wen Li Zhipeng Cai²+ Chenglu Wen¹ Yu Zang' Matthias Muller² Cheng Wang't
LISA: LIDAR Localization with Semantic Awareness
1Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China 2Intel Labs
Motivations
scene
More effective
Coordinate
utilization of
encoding to effectively extract scene geometric information, but it
Regression(SCR) successfully utilizes geometric
information: Scene
treats all points of the input equally. This is non-ideal for the task of
localization - objects that are dynamic or repetitive intuitively should
be less important than salient and static objects.
Efficiency: Inference time is a crucial metric in the localization, We
should ensure that introducing additional semantic information does
not affect the real-time ability of localization.
Preliminary analysis
Filter
all (no filter)
Mean Error (m/°)
Filter
1.79m, 1.41°
no plant
1.20m, 1.97°
no building
1.39m, 2.26°
all (no filter)
plant only
building only
Mean Error (m/°)
1.79m, 1.41°
59.08m, 13.25°
•
1.63m, 1.91°
no sidewalk
1.77m, 1.45°
sidewalk only
2.47m, 5.92°
no road
2.03m, 3.42°
road only
1.71m, 2.59°
no transportation
2.07m, 3.42°
transportation only 20.10m, 21.95°
Filtering out objects from different classes can significantly reduce
or increase the position error.
due to noise in the predicted labels and the hard threshold, naïve
filtering does not fully utilize the semantic information.
Visualization results
Methods
Frozen
Retained for inference
Discarded after training
Scene Coordinate Regression
Regressor
Experiment resu
Results on the O
LiDAR coordinate
Semantic Segmentation
feature extractor
World cocdinate
Knowledge Distillation
Methods
➤ Scene Coordinate Regression: SGLoc
➤ Semantic Segmentation: SphereFormer, SPVNAS,
➤ Knowledge Distillation: DiffKD
Acquisition of semantic features and labels
SPVNAS
SphereFormer
Denoise student semantic features
➤We use semantic
segmentation model
pre-trained on
NuScenes or Semantic-
kitti, and transfer its on
Oxford and NCLT.
➤ We train a diffusion model with the teacher feature Ftea by gradually adding
noise to Ftea and let the diffusion model learn to predict the noise.
➤Due to the small size of the student network model, the output features
naturally contain more noise. Before distill the knowledge, we denoise them.
Loss function
➤Localization loss (loc): Lloc:
➤ DDPM loss (ddpm):
=
CPP
N'
Ladpm = || (Ft)-|2
Lkd=A1Lddpm + A2/Fstu - Frea
.
➤City Scenes, 10km,
4 training trajectory
QEOxford dataset. Th
Retinal Matching
PAVLAD DCF Pa
15-13-06-37 10.90 2.49 10.61m2.56 10.75m
14 m2.48 11.44m, 214 11.07
2010 11.5
Methods
2012-02-12 7.75m, 6
2012-02-19 7.47m, 5.49
2012-03-31 6.98m 5.67 8.94m 59
2012-05-26 14.34m, 7.93 15.62m. 7.99 10
Average 9.14m, 6.40 10.67m649 757,46
The way to use semantic
117 1
No semantic Phi Sher FacKD
B063 114 11
1-13-3-39 2014 16.210
13-14-43-40 15 13 100 20
18-1414-0 16 13 18
Average 171413137
SCR with and witho
➤ KD loss (Lkd):
GT
GT
SGLOC
SGLOC
➤ Final loss (£):
L = Lloc + Lkd
LISA
LISA