**Title: GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction** **Institutions:** - Shanghai Artificial Intelligence Laboratory - The Chinese University of Hong Kong **Authors:** - Xiao Chen - Quanyi Li - Tai Wang - Tianfan Xue - Jiangmiao Pang **Overview:** GenNBV is presented as the first reinforcement learning (RL)-based end-to-end Next-Best-View (NBV) policy designed to enhance 3D reconstruction by enabling free-space exploration and cross-dataset generalization. The method is centered on informative and generalizable state embedding and extends beyond the conventional 3 Degrees of Freedom (DoF) action space to a more versatile 5 DoF space. This enhanced capability allows GenNBV to adapt to different geometries and effectively capture details despite occlusions. **Components:** 1. **Motivation:** - Identifies limitations of existing NBV policies like restricted action space, collision insensitivity, and indirect criteria. - Highlights GenNBV's advantages, including a broader action space, collision avoidance, and generalizable state embeddings. 2. **Methodology:** - Details a step-by-step process from extracting state embeddings, using probabilistic occupancy grids to evaluate the 3D scene, and computing a reward function based on coverage ratio. 3. **Visualization Comparison:** - Illustrates how GenNBV captures unseen areas and generates high-quality reconstructions compared to other methods. 4. **Generalizability Evaluation:** - Reports substantial evaluation metrics across different datasets, indicating GenNBV’s effectiveness in various scenarios. 5. **Ablation Study:** - Analyses the contribution of different components within the system, demonstrating the efficacy and robustness of the approached methods. **Takeaways:** 1. RL-based policies significantly outperform heuristic methods. 2. Key elements for effective RL-based policies include free action space and state embedding. 3. Generalization capabilities extend across datasets and categories, indicating versatility in both indoor and outdoor environments. **Contact and Additional Resources:** - *Project Page:* [gennbv.github.io](http://gennbv.github.io) - *Contact Email:* cx123@ie.cuhk.edu.hk - QR codes provided for direct links to the paper and project page. Text transcribed from the image: GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction Xiao Chen1,2 Quanyi Li1 Tai Wang1 Tianfan Xue2,† Jiangmiao Pang1,† 1OpenRobotLab, Shanghai AI Laboratory 2MMLab, The Chinese University of Hong Kong Motivation Existing NBV policies (≤ 3 DoF) -No need to consider collision -Per-scene optimized representations -Indirect heuristic criteria -Small objects Unexploitable Object-Centric Capturing Previous NBV Policies PenScene Training GenNBV (Ours) -Free action space (5 DoF) -Collision avoidance -Generalizable state embeddings -Evaluation metrics as reward -Large objects and scenes Overview GenNBV is the first RL-based end-to-end NBV policy, which allows free-space exploration and cross-dataset generalization. GenNBV is guided by informative and generalizable state embedding. GenNBV extend previous limited 3 DoF action to 5 DoF free space, that allow adapt to any geometry and capture self-occlusion details. GenNBV can be generalized cross-dataset and cross-category w/o finetuning. Project Page: gennbv.github.io Contact: cx123@ie.cuhk.edu.hk Paper Project Page Methodology Step 1 Historical Observations Segmented Images 3D Grids in local free space Step 2 Training Dataset Object-centric Object-centric Generalization Embedding Occupancy Action Pipeline: ① extract state embedding from observation; ② NBV policy predicts next best viewpoint; ③ agent captures novel observations; ④ Repeat ① - ③. Probabilistic occupancy grid indicates the occupancy and exploration process. -the effectiveness of probabilistic 3D grid - 20 Map Reward function: Coverage Ratio (CR), Generalizability Evaluation GenNBV is trained on hundreds of houses from Houses3K training set Cross-dataset Animals Animals Next-Best-View Capturing Hemisphere (CR: 89.8%) Scan-RL (CR: 90.6%) GenNBV (CR; 96.8%) Scene-Completeness Scene-Completeness Visualization Comparison (1) House2 Scan-RL GenNBV(ours) (2) House3 Scan-RL Uncertainty-guided Ablation Study Takeaways: 1) Generalizability of NBV policies: RL-based > Info Gain-based > Heuristic 2) Key elements of RL-based policies: free action space and informative scene representation 3) Further generalization: cross-dataset > cross-category > indoor/outdoor scenes