"At the CVPR conference, a research team from Zhejiang University presents their poster titled 'Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching.' The poster, displayed at position 29, delves into the motivations, methods, and results of their study. It addresses challenges in stereo matching networks, proposing a solution to better model the disparity ground-truth, with detailed visual and quantitative results showcasing its effectiveness. To the right of the poster, a man attentively observes the presentation, while the bustling environment signifies a day filled with knowledge-sharing and academic discourse." Text transcribed from the image: CVPR TMD SEATTLE, WA 128 Nbe. Inl. Runtime (ms) 024 124 24.0 007 CITY 浙江大學 HEJIANG ZHEJIANG UNIVERSITY Motivation Stereo matching networks usually over-smoothly estimate object boundaries, causing bleeding artifacts in reconstructed point clouds. 21 Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching Peng Xu Zhiyu Xiang Chengyu Qiao Jingyun Fu Tianyu Pu Our Method Clusteri Laplacian Modeling for each Cluster Adaptive Multi-Modal Laplacian Modelling TALCVPR SEATTLE, WA JUNE 17-21, 2024 Quantitative Results Method 田 20 POSE 230 405 258 208 3.48 2.36 367 ING 42 232 171 234 24t 154 433 234 120 8 Sachet 3.30 1 212 334 153 10 211 242 PSMMO-MOM 3.49 182 230 271 18 430 200 2.08 154 CON 1.56 320 131 130 Actes 270 251 3 129 ➤ Existing methods model the disparity ground-truth as the uni-modal distribution, but fail to suppress the multi-modal outputs at the edge. Meanwhile, the single-modal disparity estimator (SME) suffers from severe misalignment artifacts. Disparity Ground-Trath First, clustering is applied within the local window to separate different modals. Then, each cluster is modeled as a uni-modal distribution using the Laplacian distribution. Last, structural information is used to fuse the generated distributions into a multi-modal distribution. Qualitative Results All of the three baselines are lifted to a highly competitive level by our method GANet with our method achieves new state-of-the-art results on both (OTT) 2015 and KITTI 2012 benchmarks GAN 233 234 148 346 1.5 AT 321 230 +19 157 126 2.54 145 = ACVNE 10 165 2.36 EST 133 123 107 1.40 129 265 123 16 231 267 262 149 171 233 112 200 1.38 409 130 2.46 130 Gechet- -Oun 142 160 130 GANGE-OUSS 1.38 2.38 155 124 154 16 253 14 the selection of SME the selection of DME ➤ Our work aims to explore a better modeling for the stereo ground-truth and improve the robustness of the disparity estimator. Contributions An adaptive multi-modal probability modeling for supervising stereo networks training, effectively guiding the networks to learn clear distribution patterns. A dominant-modal disparity estimator (DME) that can obtain accurate results upon multi-modal outputs. > State-of-the-art performance on both the KITTI 2015 and KITTI 2012 benchmarks. ➤ Excellent cross-domain generalization performance. The output probability distribution ➤ We propose to select the modal with the maximum cumulative probability as the dominant modal. Vis. of Output Distributions Left image PSMNet [1] PSMNet-12] PSMNet-Ours Top row: background pixel ➤Bottom row: foreground pixel Generalization Performance Links 29 29 PSMNet [1] PSMN-2 PSMNet-Ours From top to bottom: left images, disparity maps, error maps, and reconstructed point clouds Let images PSM PS-Ou From top to betham: KITT 2015, KITTI 2012, Middlebury, and ETH References [1] Chang and Chen. Pyramid stereo matching network, CPR 221 (2) Chen, Chen, and Cheng. On the over-smoothing problem of cm based disparty estimation, CC 2