A research poster titled "RoHM: Robust Human Motion Reconstruction via Diffusion" is displayed at an academic conference. The poster, highlighted as a significant contribution, is authored by researchers from institutions such as ETH Zurich and Meta Reality Labs Research. It delves into sophisticated techniques for reconstructing human motion using diffusion models, aimed at addressing challenges like occlusions and noisy input motion. The poster includes sections on the problem setup, methodology for diffusing global and local motion, and controlling global motion reconstruction. Additionally, it presents the experimental results across different datasets, showcasing substantial improvements in motion reconstruction accuracy and physical plausibility. QR codes and figures provide visual and interactive elements for deeper engagement. The organized and informative layout underscores the poster's importance and relevance in the field of computer vision and machine learning. Text transcribed from the image: 5. Stof e depth pros 181 Highlight ROHM: Robust Human Motion Reconstruction via Diffusion Siwei Zhang, Bharat Lal Bhatnagar?, Yuanlu Xu², Alexander Winkler2, Petr Kadlecek?, Siyu Tang', Federica Bogo² 1 ETH Zürich 2 Reality Labs Research, Meta The work was done during an internship at Meta. Overview Diffusing Global and Local Motion ETH Zürich Meta BVLG Computer Vision and Learning Group CVPR Challenges: Noisy 2D keypoint detections Body occlusions Our goals: Reconstruct realistic 3D motions in global space from monocular videos Robust to noise and occlusions Our contributions: ROHM: a diffusion-based approach TrajControl: to model trajectory-pose correlations Various applications: motion reconstruction, denoising, infilling Problem Setup on Depth accuracy Input: noisy & incomplete motion Output: complete 3D motion RGB(-D) video Motion Representation: Joint-based + SMPLX-based r=(r,,,,,,,,) Root trajectory TrajNet: reconstruct the global trajectory PoseNet: reconstruct the local body pose Ro DR(R,t,CR) (Ro. Po) Dp((Ro, P.), t, cp) Inference iteration = 1 M. R MOP Training on: AMASS SE SEATTLE, WA Experiments Test on: AMASS (synthetic noise + occlusions), PROX (RGBD/RGB), EgoBody (RGB) Evaluation metrics: • Accuracy: MPJPE Physical plausibility: acceleration + foot skating + foot-floor penetration Method R- R Ours GMPJPE -vis -occ all VPoser-t 33.0 242.6 109.2 HuMor [67] 42.4 167.9 88.0 0.68 0.230 MDM++ 36.2 71.9 49.2 0.94 0.102 21.8 57.4 34.8 0.95 0.078 Contt Skat 0.219 Method LEMO [100] 0.176 HUMOR [67] 0.117 PhaseMP [72] Ours 0.038 TrajNet RGB-D RGB 23 35.41 1.8 46.96 9.73 2.2 Skating Accel Dist! Skating Accel Dist 1.8 34.22 1.9 54.76 0.139 0.180 1.8 3.36 0.116 Results on AMASS: Results on PROX: >30% improvement over accuracy >67% (RGB-D)/>17% (RGB) improvement over foot skating Ma R R TraNet P PoseNet Po No trajectory-pose correlation. → foot skating Controlling Global Motion Reconstruction TrajControl: fine-tuning Traj Net with local body pose Iteratively refine local and global motion at inference time TraNet Inference iteration > 1 P PoseNet p P TrajControl E R TrajNet RGB-D RGB Method Skating L Accel Skating p" = (J,J,0,3,f) Ours 0.038 w/o TrayControl 0.056 18 2.1 0.116 0.165 Accel 2.2 27 → TrajControl improves motion plausibility Local body pose Ablation for TrajControl on PROX dataset HuMoR Ours HUMOR Ours 30x times faster than HUMOR during inference!