In the image, a man with a black jacket is standing in front of a white poster. The poster features a map with a green outline of an urban area, accompanied by textual information. The man appears to be engaged in a discussion or presentation, as he is pointing to the map while a young boy stands nearby, listening attentively. The setting suggests that the event is taking place outdoors, possibly at a public venue or a festival. The map on the poster likely represents a city or a region, and the man might be explaining the significance of the urban area or its features to the boy. Overall, the image conveys a sense of learning and connection between the man and the young boy as they discuss the white poster with the map. Text transcribed from the image: 天門 UNIVERSITA MOIENSIS intel. Highlight Bochun Yang Zijun Li Wen Li Zhipeng Cai²+ Chenglu Wen¹ Yu Zang' Matthias Muller² Cheng Wang't LISA: LIDAR Localization with Semantic Awareness 1Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China 2Intel Labs Motivations scene More effective Coordinate utilization of encoding to effectively extract scene geometric information, but it Regression(SCR) successfully utilizes geometric information: Scene treats all points of the input equally. This is non-ideal for the task of localization - objects that are dynamic or repetitive intuitively should be less important than salient and static objects. Efficiency: Inference time is a crucial metric in the localization, We should ensure that introducing additional semantic information does not affect the real-time ability of localization. Preliminary analysis Filter all (no filter) Mean Error (m/°) Filter 1.79m, 1.41° no plant 1.20m, 1.97° no building 1.39m, 2.26° all (no filter) plant only building only Mean Error (m/°) 1.79m, 1.41° 59.08m, 13.25° • 1.63m, 1.91° no sidewalk 1.77m, 1.45° sidewalk only 2.47m, 5.92° no road 2.03m, 3.42° road only 1.71m, 2.59° no transportation 2.07m, 3.42° transportation only 20.10m, 21.95° Filtering out objects from different classes can significantly reduce or increase the position error. due to noise in the predicted labels and the hard threshold, naïve filtering does not fully utilize the semantic information. Visualization results Methods Frozen Retained for inference Discarded after training Scene Coordinate Regression Regressor Experiment resu Results on the O LiDAR coordinate Semantic Segmentation feature extractor World cocdinate Knowledge Distillation Methods ➤ Scene Coordinate Regression: SGLoc ➤ Semantic Segmentation: SphereFormer, SPVNAS, ➤ Knowledge Distillation: DiffKD Acquisition of semantic features and labels SPVNAS SphereFormer Denoise student semantic features ➤We use semantic segmentation model pre-trained on NuScenes or Semantic- kitti, and transfer its on Oxford and NCLT. ➤ We train a diffusion model with the teacher feature Ftea by gradually adding noise to Ftea and let the diffusion model learn to predict the noise. ➤Due to the small size of the student network model, the output features naturally contain more noise. Before distill the knowledge, we denoise them. Loss function ➤Localization loss (loc): Lloc: ➤ DDPM loss (ddpm): = CPP N' Ladpm = || (Ft)-|2 Lkd=A1Lddpm + A2/Fstu - Frea . ➤City Scenes, 10km, 4 training trajectory QEOxford dataset. Th Retinal Matching PAVLAD DCF Pa 15-13-06-37 10.90 2.49 10.61m2.56 10.75m 14 m2.48 11.44m, 214 11.07 2010 11.5 Methods 2012-02-12 7.75m, 6 2012-02-19 7.47m, 5.49 2012-03-31 6.98m 5.67 8.94m 59 2012-05-26 14.34m, 7.93 15.62m. 7.99 10 Average 9.14m, 6.40 10.67m649 757,46 The way to use semantic 117 1 No semantic Phi Sher FacKD B063 114 11 1-13-3-39 2014 16.210 13-14-43-40 15 13 100 20 18-1414-0 16 13 18 Average 171413137 SCR with and witho ➤ KD loss (Lkd): GT GT SGLOC SGLOC ➤ Final loss (£): L = Lloc + Lkd LISA LISA