A research poster presentation at a conference features a study titled "Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching." Presented by a team from Zhejiang University, the poster is positioned between two display stands labeled 28 and 29 at a CVPR (Conference on Computer Vision and Pattern Recognition) event. The poster details motivations, methods, contributions, qualitative results, quantitative results, and generalization performance metrics of the proposed stereo matching model. The research aims to improve disparity estimation by employing a multi-modal probability modeling approach to better handle severe misalignment artifacts and ensure robust object boundary reconstruction in point clouds. Several graphs, tables, and visual examples of output disparity estimations illustrate the findings. QR codes and reference lists are included for further exploration of the study. Attendees can be seen discussing the displayed content and engaging with the presenters in the conference hall.
Text transcribed from the image:
CVPR
TMD SEATTLE, WA
128
Nbe. Inl. Runtime (ms)
024
124
24.0
007 CITY
浙江大學
HEJIANG
ZHEJIANG UNIVERSITY
Motivation
Stereo matching networks usually over-smoothly
estimate object boundaries, causing bleeding artifacts
in reconstructed point clouds.
21
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu Zhiyu Xiang Chengyu Qiao Jingyun Fu Tianyu Pu
Our Method
Clusteri
Laplacian Modeling for each Cluster Adaptive Multi-Modal Laplacian Modelling
TALCVPR
SEATTLE, WA JUNE 17-21, 2024
Quantitative Results
Method
田
20
POSE
230 405 258
208
3.48
2.36
367
ING
42
232
171
234
24t
154 433
234
120
8
Sachet
3.30
1
212
334 153
10
211
242
PSMMO-MOM
3.49 182
230
271
18
430
200
2.08
154
CON
1.56
320
131
130
Actes
270
251
3
129
233
234
➤ Existing methods model the disparity ground-truth as
the uni-modal distribution, but fail to suppress the
multi-modal outputs at the edge. Meanwhile, the
single-modal disparity estimator (SME) suffers from
severe misalignment artifacts.
Disparity Ground-Trath
First, clustering is applied within the local window to separate different modals.
Then, each cluster is modeled as a uni-modal distribution using the Laplacian distribution.
Last, structural information is used to fuse the generated distributions into a multi-modal distribution.
Qualitative Results
All of the three baselines are lifted to a highly competitive level by our method
GANet with our method achieves new state-of-the-art results on both (OTT) 2015
and KITTI 2012 benchmarks
GAN
148
346 1.5
AT
321
230
+19
157
126
2.54 145
=
ACVNE
10
165
2.36
EST
133
123 107
1.40
129 265
123
16
231
267
262
149
171
233
112 200
1.38
409
130
2.46
130
Gechet-
-Oun
142
160
130
GANGE-OUSS
1.38 2.38 155
124
154
16
253
14
the selection of SME
the selection of DME
➤ Our work aims to explore a better modeling for the
stereo ground-truth and improve the robustness of the
disparity estimator.
Contributions
An adaptive multi-modal probability modeling for
supervising stereo networks training, effectively guiding
the networks to learn clear distribution patterns.
A dominant-modal disparity estimator (DME) that can
obtain accurate results upon multi-modal outputs.
> State-of-the-art performance on both the KITTI 2015
and KITTI 2012 benchmarks.
➤ Excellent cross-domain generalization performance.
The output probability distribution
➤ We propose to select the modal with the
maximum cumulative probability as the
dominant modal.
Vis. of Output Distributions
Left image
PSMNet [1] PSMNet-12] PSMNet-Ours
Top row: background pixel
➤Bottom row: foreground pixel
Generalization Performance
Links
29
29
PSMNet [1]
PSMN-2
PSMNet-Ours
From top to bottom: left images, disparity maps, error maps, and
reconstructed point clouds
Let images
PSM
PS-Ou
From top to betham: KITT 2015, KITTI 2012, Middlebury, and ETH
References
[1] Chang and Chen. Pyramid stereo matching network, CPR 221
(2) Chen, Chen, and Cheng. On the over-smoothing problem of cm based disparty estimation, CC 2