A research poster presentation at a conference features a study titled "Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching." Presented by a team from Zhejiang University, the poster is positioned between two display stands labeled 28 and 29 at a CVPR (Conference on Computer Vision and Pattern Recognition) event. The poster details motivations, methods, contributions, qualitative results, quantitative results, and generalization performance metrics of the proposed stereo matching model. The research aims to improve disparity estimation by employing a multi-modal probability modeling approach to better handle severe misalignment artifacts and ensure robust object boundary reconstruction in point clouds. Several graphs, tables, and visual examples of output disparity estimations illustrate the findings. QR codes and reference lists are included for further exploration of the study. Attendees can be seen discussing the displayed content and engaging with the presenters in the conference hall. Text transcribed from the image: CVPR TMD SEATTLE, WA 128 Nbe. Inl. Runtime (ms) 024 124 24.0 007 CITY 浙江大學 HEJIANG ZHEJIANG UNIVERSITY Motivation Stereo matching networks usually over-smoothly estimate object boundaries, causing bleeding artifacts in reconstructed point clouds. 21 Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching Peng Xu Zhiyu Xiang Chengyu Qiao Jingyun Fu Tianyu Pu Our Method Clusteri Laplacian Modeling for each Cluster Adaptive Multi-Modal Laplacian Modelling TALCVPR SEATTLE, WA JUNE 17-21, 2024 Quantitative Results Method 田 20 POSE 230 405 258 208 3.48 2.36 367 ING 42 232 171 234 24t 154 433 234 120 8 Sachet 3.30 1 212 334 153 10 211 242 PSMMO-MOM 3.49 182 230 271 18 430 200 2.08 154 CON 1.56 320 131 130 Actes 270 251 3 129 233 234 ➤ Existing methods model the disparity ground-truth as the uni-modal distribution, but fail to suppress the multi-modal outputs at the edge. Meanwhile, the single-modal disparity estimator (SME) suffers from severe misalignment artifacts. Disparity Ground-Trath First, clustering is applied within the local window to separate different modals. Then, each cluster is modeled as a uni-modal distribution using the Laplacian distribution. Last, structural information is used to fuse the generated distributions into a multi-modal distribution. Qualitative Results All of the three baselines are lifted to a highly competitive level by our method GANet with our method achieves new state-of-the-art results on both (OTT) 2015 and KITTI 2012 benchmarks GAN 148 346 1.5 AT 321 230 +19 157 126 2.54 145 = ACVNE 10 165 2.36 EST 133 123 107 1.40 129 265 123 16 231 267 262 149 171 233 112 200 1.38 409 130 2.46 130 Gechet- -Oun 142 160 130 GANGE-OUSS 1.38 2.38 155 124 154 16 253 14 the selection of SME the selection of DME ➤ Our work aims to explore a better modeling for the stereo ground-truth and improve the robustness of the disparity estimator. Contributions An adaptive multi-modal probability modeling for supervising stereo networks training, effectively guiding the networks to learn clear distribution patterns. A dominant-modal disparity estimator (DME) that can obtain accurate results upon multi-modal outputs. > State-of-the-art performance on both the KITTI 2015 and KITTI 2012 benchmarks. ➤ Excellent cross-domain generalization performance. The output probability distribution ➤ We propose to select the modal with the maximum cumulative probability as the dominant modal. Vis. of Output Distributions Left image PSMNet [1] PSMNet-12] PSMNet-Ours Top row: background pixel ➤Bottom row: foreground pixel Generalization Performance Links 29 29 PSMNet [1] PSMN-2 PSMNet-Ours From top to bottom: left images, disparity maps, error maps, and reconstructed point clouds Let images PSM PS-Ou From top to betham: KITT 2015, KITTI 2012, Middlebury, and ETH References [1] Chang and Chen. Pyramid stereo matching network, CPR 221 (2) Chen, Chen, and Cheng. On the over-smoothing problem of cm based disparty estimation, CC 2