**Title: GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction**

**Institutions:**
- Shanghai Artificial Intelligence Laboratory
- The Chinese University of Hong Kong

**Authors:**
- Xiao Chen
- Quanyi Li
- Tai Wang
- Tianfan Xue
- Jiangmiao Pang

**Overview:**
GenNBV is presented as the first reinforcement learning (RL)-based end-to-end Next-Best-View (NBV) policy designed to enhance 3D reconstruction by enabling free-space exploration and cross-dataset generalization. The method is centered on informative and generalizable state embedding and extends beyond the conventional 3 Degrees of Freedom (DoF) action space to a more versatile 5 DoF space. This enhanced capability allows GenNBV to adapt to different geometries and effectively capture details despite occlusions.

**Components:**
1. **Motivation:**
   - Identifies limitations of existing NBV policies like restricted action space, collision insensitivity, and indirect criteria.
   - Highlights GenNBV's advantages, including a broader action space, collision avoidance, and generalizable state embeddings.

2. **Methodology:**
   - Details a step-by-step process from extracting state embeddings, using probabilistic occupancy grids to evaluate the 3D scene, and computing a reward function based on coverage ratio.
   
3. **Visualization Comparison:**
   - Illustrates how GenNBV captures unseen areas and generates high-quality reconstructions compared to other methods.

4. **Generalizability Evaluation:**
   - Reports substantial evaluation metrics across different datasets, indicating GenNBV’s effectiveness in various scenarios.
   
5. **Ablation Study:**
   - Analyses the contribution of different components within the system, demonstrating the efficacy and robustness of the approached methods.

**Takeaways:**
1. RL-based policies significantly outperform heuristic methods.
2. Key elements for effective RL-based policies include free action space and state embedding.
3. Generalization capabilities extend across datasets and categories, indicating versatility in both indoor and outdoor environments.

**Contact and Additional Resources:**
- *Project Page:* [gennbv.github.io](http://gennbv.github.io)
- *Contact Email:* cx123@ie.cuhk.edu.hk
- QR codes provided for direct links to the paper and project page.
Text transcribed from the image:
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Xiao Chen1,2 Quanyi Li1 Tai Wang1 Tianfan Xue2,† Jiangmiao Pang1,† 
1OpenRobotLab, Shanghai AI Laboratory 2MMLab, The Chinese University of Hong Kong

Motivation
Existing NBV policies (≤ 3 DoF)
-No need to consider collision
-Per-scene optimized representations
-Indirect heuristic criteria
-Small objects

Unexploitable
Object-Centric Capturing
Previous NBV Policies
PenScene Training

GenNBV (Ours)
-Free action space (5 DoF)
-Collision avoidance
-Generalizable state embeddings
-Evaluation metrics as reward
-Large objects and scenes

Overview
GenNBV is the first RL-based end-to-end NBV policy, which allows free-space exploration and cross-dataset generalization.

GenNBV is guided by informative and generalizable state embedding.

GenNBV extend previous limited 3 DoF action to 5 DoF free space, that allow adapt to any geometry and capture self-occlusion details.

GenNBV can be generalized cross-dataset and cross-category w/o finetuning.

Project Page: gennbv.github.io
Contact: cx123@ie.cuhk.edu.hk

Paper Project Page

Methodology
Step 1
Historical Observations
Segmented Images
3D Grids in local free space
Step 2
Training Dataset
Object-centric
Object-centric
Generalization
Embedding
Occupancy
Action

Pipeline: ① extract state embedding from observation; ② NBV policy predicts next best viewpoint; ③ agent captures novel observations; ④ Repeat ① - ③.
Probabilistic occupancy grid indicates the occupancy and exploration process.
-the effectiveness of probabilistic 3D grid - 20 Map

Reward function: Coverage Ratio (CR), 

Generalizability Evaluation
GenNBV is trained on hundreds of houses from Houses3K training set

Cross-dataset
Animals
Animals

Next-Best-View Capturing
Hemisphere (CR: 89.8%)
Scan-RL (CR: 90.6%)
GenNBV (CR; 96.8%)
Scene-Completeness
Scene-Completeness

Visualization Comparison
(1) House2
Scan-RL
GenNBV(ours)

(2) House3
Scan-RL

Uncertainty-guided

Ablation Study

Takeaways:
1) Generalizability of NBV policies: RL-based > Info Gain-based > Heuristic
2) Key elements of RL-based policies: free action space and informative scene representation
3) Further generalization: cross-dataset > cross-category > indoor/outdoor scenes