In the image, there is a large display of information exhibited at a convention or exhibition. The display features an array of charts, graphs, and diagrams, showcasing various aspects of a project or research study. The presentation appears to be discussing different forms of simulation and modeling, potentially related to manufacturing processes. The overall setup suggests that the convention is focused on advancing technologies and innovative solutions in the field of 3D modeling and simulational analysis. Attendees can gather around the display to learn about the project and its implications for the industry. Text transcribed from the image: CVPR JUNE 17-21, 2024 wwww SEATTLE, WA Motivation Highlight UniMODE: Unified Monocular 3D Object Detection Zhuoling Li¹ HKU Xiaogang Xu2,3 2CUHK Domain Head SerNam Lim 3ZJU Framework Domain Confidence (C₁₂C) Hengshuang Zhao! 4UCF Proposal Map M Proposal Queries Proposal Head MLP Extracted Feature BXCXX M+N Queries N Random Queries Feature Head Indoor Scene Outdoor Scene Backbone + Neck Sparse BEV Feature BEV Encoder Projection Self-Attn Depth Head Input Images Bx3xHxW Sparse BEV Feature Projection Uneven BEV Feature Grid BEV Decoderx6 Class Alignment Loss Detection Results HO Query FFN FFN DALN Cross-AttnDALN 0-0 DALN Domain Confidence. (C₁.C₂C) a Sparse Tokens Domain Parameters (a₁. B₁) (B) Even Grid (Previous) Visualization VERSIT ARKitScenes Hypersim nuScenes 191 Synthesized Scene Annotation Problem Although numerous 3D object detectors have been developed, they are mostly designed for a single domain. Unifying indoor and outdoor detection is challenging due to diverse geometry distributions and heterogeneous domain distributions. Unstable Training 50 45 40 Loss boost 35 30 5 0 5 !(a.B) Input-dependent Parameter Dense Feature Sparse Feature Projection on Uneven Grid Input Layer Norm Feature Mini-adjust Feature Output Uneven Grid (Ours) Images from Diverse Domains In this work, we propose the UniMODE detector, which achieves SOTA in unified 3D object detection. Geometry Distribution Gap 100 100 UniMODE PETR 80 80 Gradient NaN 60 60 2 40 20 40 20 Heterogeneous Domain Distributions Not labeled KITTI Objectron SUN-RGBD Main Results Comparison with Existing Detectors Method. AP AP AP AP AP AP AP AP M3D-RPN 10.4% SMOKE 19.5% 966 FCOS3D 17.6% 985 PGD 22.9% 11.29 GUPNet 19.9% Im VoxelNet 21.5% 9.49 BEVFormer PETR 25.9% x x x 27.8% X x X I Cube RCNN 9.5% 24.9% 15.0% 31.9% 27.9% 12.1% 8.5% 23.3% UniMODE UniMODE 22.3% 39.1% 28.3% 41.0% 26.9% 30.2% 7.4% 10.6% 29.7% 12.7% 8.1% 25.5% 8.7% 149% 31.1% 2825 5 0 2000 4000 6000 8000 10000 12000 -60 40 -20 O 20 40 -60 40 -20 20 Iteration X axis X axis Outdoor Although there are many popular 3D object detectors, we find that they cannot converge smoothly in unified 3D object detection. By contrast, by incorporating our proposed techniques, the developed detector UniMODE achieves stable training. Indoor • Two-Stage Detection Architecture: Utilizing the first stage network to inform the second stage about rough target distribution, the training is stabilized. • To address the grid size conflict between indoor and outdoor scenes, we propose the uneven BEV grid. (a) ARKitScenes (b) Hypersim Domain Adaptive Layer Normalization: An efficient adaptive feature normalization method is developed to bridge the significant feature discrepancy between various data domains. Class Alignment Loss: We devise an effective loss to address the label conflict between various datasets. Backbone AP AP AP AP AP 1 AP1 21.0% DLA34 ConvNext 23.0% 6.7% 42.3% 52.5% 27.8% 31.7% 8.1% 48.0% 66.1% 29.2% 36.0% Ablation Study PH UBG SBFP UDA AP AP APsot Improvement 10.9% 14.3% 12.3% 13.4% 22.2% 15.9% 3.657 14.00% 23.8% 16.6% 0.7% 13.4% 23.7% 16.6% 0.0% t 14.8% 24.5% 17.4% 0.8%