The photograph depicts a research poster presentation at an academic conference. The poster, titled “Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching," authored by Peng Liu, Zhiyu Xiang, Chengyu Qiao, Jingyun Fu, and Tianyu Pu from Zhejiang University, is displayed on board number 29. The poster elaborates on advanced techniques in stereo matching, focusing on adaptive multi-modal probability modeling to address issues of object boundary blurring and alignment artifacts in reconstructed point clouds. Key sections of the poster include: 1. **Motivation**: Highlighting the challenges in existing stereo matching networks and the aim of improving stereo ground-truth models and robustness. 2. **Contributions**: Detailing the introduction of a novel multi-modal probability model, a dominant modality estimator (DME), and the superior performance results on public benchmarks. 3. **Our Method**: Illustrated with diagrams, describing the technical approach in separating different models within the local window to achieve precise stereo matching. 4. **Quantitative Results & Generalization Performance**: Tables and charts providing performance metrics and qualitative results, demonstrating the efficacy of the proposed method. 5. **Visualization of Output Distribution**: Comparative imagery showing the improvements in output distributions against benchmarks. In the background, attendees of the conference are seen engaging with the presenters and discussing the content. This scene captures the academic rigour and collaborative nature of such conferences, emphasizing the dissemination and discussion of cutting-edge research in computer vision and pattern recognition. The poster includes QR codes for accessing additional resources and links, underscoring the integration of digital tools for enhanced information sharing. Text transcribed from the image: CVPR TMD SEATTLE, WA 128 Nbe. Inl. Runtime (ms) 024 124 24.0 007 CITY 浙江大學 HEJIANG ZHEJIANG UNIVERSITY Motivation Stereo matching networks usually over-smoothly estimate object boundaries, causing bleeding artifacts in reconstructed point clouds. 21 Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching Peng Xu Zhiyu Xiang Chengyu Qiao Jingyun Fu Tianyu Pu Our Method Clusteri Laplacian Modeling for each Cluster Adaptive Multi-Modal Laplacian Modelling TALCVPR SEATTLE, WA JUNE 17-21, 2024 Quantitative Results Method 田 20 POSE 230 405 258 208 3.48 2.36 367 ING 42 232 171 234 24t 154 433 234 120 8 Sachet 3.30 1 212 334 153 10 211 242 PSMMO-MOM 3.49 182 230 271 18 430 200 2.08 154 CON 1.56 320 131 130 Actes 270 251 3 129 ➤ Existing methods model the disparity ground-truth as the uni-modal distribution, but fail to suppress the multi-modal outputs at the edge. Meanwhile, the single-modal disparity estimator (SME) suffers from severe misalignment artifacts. Disparity Ground-Trath First, clustering is applied within the local window to separate different modals. Then, each cluster is modeled as a uni-modal distribution using the Laplacian distribution. Last, structural information is used to fuse the generated distributions into a multi-modal distribution. Qualitative Results All of the three baselines are lifted to a highly competitive level by our method GANet with our method achieves new state-of-the-art results on both (OTT) 2015 and KITTI 2012 benchmarks GAN 233 234 148 346 1.5 AT 321 230 +19 157 126 2.54 145 = ACVNE 10 165 2.36 EST 133 123 107 1.40 129 265 123 16 231 267 262 149 171 233 112 200 1.38 409 130 2.46 130 Gechet- -Oun 142 160 130 GANGE-OUSS 1.38 2.38 155 124 154 16 253 14 the selection of SME the selection of DME ➤ Our work aims to explore a better modeling for the stereo ground-truth and improve the robustness of the disparity estimator. Contributions An adaptive multi-modal probability modeling for supervising stereo networks training, effectively guiding the networks to learn clear distribution patterns. A dominant-modal disparity estimator (DME) that can obtain accurate results upon multi-modal outputs. > State-of-the-art performance on both the KITTI 2015 and KITTI 2012 benchmarks. ➤ Excellent cross-domain generalization performance. The output probability distribution ➤ We propose to select the modal with the maximum cumulative probability as the dominant modal. Vis. of Output Distributions Left image PSMNet [1] PSMNet-12] PSMNet-Ours Top row: background pixel ➤Bottom row: foreground pixel Generalization Performance Links 29 29 PSMNet [1] PSMN-2 PSMNet-Ours From top to bottom: left images, disparity maps, error maps, and reconstructed point clouds Let images PSM PS-Ou From top to betham: KITT 2015, KITTI 2012, Middlebury, and ETH References [1] Chang and Chen. Pyramid stereo matching network, CPR 221 (2) Chen, Chen, and Cheng. On the over-smoothing problem of cm based disparty estimation, CC 2