The poster, presented by researchers from the Hong Kong University of Science and Technology and Tsinghua University, showcases the project "GenN2N: Generative NeRF2NeRF Translation", developed by Xiangyue Liu, Han Xue, Kunming Luo, and Ping Tan. This study proposes a novel generative NeRF-to-NeRF translation framework that allows for universal NeRF editing through a conditional adversarial network. 

**Contributions:**
- Demonstration of various 3D NeRF edited outputs, showing transformed facial expressions and object styles.
- Introduction of a 3D space learning framework to disentangle NeRF edits and 2D camera views, enhancing efficiency, quality, and diversity.

**Method:**
- Detailed depiction of the framework involving an image translation component, latent disentanglement, and conditional adversarial loss to handle diverse NeRF editing tasks.
- A schematic explanation of how inputs are encoded, mapped through latent vectors, and processed to generate edited NeRF scenes.

**Loss Functions:**
- Contrastive Loss: Aimed at distinguishing the differences between edit codes and ensuring visual consistency in output.
- Conditional Adversarial Loss: Targets the generation of realistic edits across different styles.
- Training Loss: Combines KL Divergence to regularize the latent space, ensuring high-quality translations and efficient learning.

**Qualitative and Quantitative Results:**
- NeRF Colorization: Comparison showing improved coloring techniques using the proposed method.
- NeRF Super-resolution: Enhanced details in rendered images.
- NeRF Inpainting: Effective filling in of missing parts in rendered scenes, showcasing the robustness of the method.
- Text-driven NeRF Editing: Demonstration of NeRF scene alterations driven by textual inputs.

**Ablation Studies:**
- Analysis showing the impact of different components and techniques on the performance of NeRF editing.

This poster was presented at CVPR, attracting attention for its innovative approach to neural radiance field translation and editing.
Text transcribed from the image:
香港科技大學
HE HONG KONG
NIVERSITY OF SCIENCE
ND TECHNOLOGY
大發
4
1911
UNIVE
清華大學
Tsinghua University
Contributions
GenN2N: Generative NeRF2NERF Translation
Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3.4+
HKUST, 2Tsinghua University, ³ Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute
Contrastive Loss
Original Scene
Edit
CVPR
IM SEATTLE, WA JUNE 17-21, 2024
https://xiangyueliu.github.io/GenN2N/
2
Qualitative and Quantitative Results
NeRF Colorization
NERF Super-resolution
Output 1
Output 2
Output 3
Input
Output 1
Output 2
Output 3
pose i
posel
Edit
S
Original NeRF
Instruct-NeRF2NERF
Translated
NeRF
Render
Original NeRF
Instruct-NeRF2NeRF
PaletteNeRF
pose!
Render
pose i
NeRF-SR
Our
Method
PSNR 1
SSIM
LAPS
C
Reduce style distance
ResShin (40+NeRF
19.978
0535
01156
catt
contr
Ours inference I
Instruct-NeRF2NeRF
20299
0647
02732
Ours inference 2
Ours inference 3
NeRF-SR [4]
27.957
0.897
0.0997
Outs w/o C
12.555
0.663
02001
Method
CF ↑
FID
Ours w/o
Ours
15372
0662
0.1834
28.501
0.913
0.074
DDColor [14]+NeRF
40.435
148.957
Instruct-NeRF2NeRF
45.599
201.456
NeRF Inpainting
PaletteNeRF [16]
39.654
Ours w/o Cady
35.031
137.740
Ours w/o contr
34.829
105.750
Ours
65.099
35.041
Original NeRF
Instruct-NeRF2NeRF
tive NeRF-to-NeRF translation formulation for the universal NeRF editing
her with a generic solution.
E-GAN framework that can learn the distribution of all possible 3D NERF
esponding to the a set of input edited 2D images.
stive learning framework to disentangle the 3D edits and 2D camera views.
efficiency, quality, and diversity of the NeRF-to-NeRF translation results.
Crep
contr
Increase style distance
Increase the distance between edit codes of
same-view different-edit style images.
Reduce the distance between edit codes of
different-view same-edit style images.
Conditional Adversarial Loss
View I with different edit styles
Novel view!
Method
Image translation
Latent distill
Pose
Text-driven editing
Colorization
Super-resolution
Inpainting
Encoder MLP
Translated
NeRF
(S)-(S-1)
3M-1
Pose
Z
KL loss
P(Znorm)logP(z)
Plznorm)
{z}}
Constrain to Guassian distribution
Edited 3D scene
peline:
Translated
NeRF
Rendering loss
recon = ||c-s
Lady = GAN(CSS)
Ccontr = Contrastive(CCS)
ll: extract edit codes from the translated image set, which serve as the input of the
NeRF.
a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv
to optimize the appearance and geometry of the translated NeRF.
sample a latent vector from Gaussian distribution and render a corresponding multi-
istent 3D scene with high quality.
.
Concat
Concat
Text-driven NeRF Editing
SPin-NeRF
Ours
Method
PSNR 1
SSIM
LMPS
Original NeRF
Fake pair LAD
Distinguishe artifacts (e.g. blur, distortion) in novel-
view rendered image compared with target image.
Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2
LaMa B1+NeRF
18.983
0.3706
0.1730
Real pair
Instruct-NeRFINERF
16.734
0.3008
02750
SPin-NeRF (241
24.369
0.7217
0.1754
Ours
26.868
0.8137
01284
Ablation
Ours inference 1
Method
Ours inference 2
CLIP Text-Image
Direction Similarity
Ours inference 3
CLIP Text-Image
CLIP Direction
M
FID
Direction Similarity
Consistency 1
CLIP Direction
Consistency ↑
FID
M=1
0.2635
0.9610
123.505
M-3
0.2807
09650
91823
InstructPix2Pix [2]+NeRF
Instruct-NeRF2NeRF
0.1669
0.8475
270.542
M-5
0.2835
0.9638
86.377
0.2021
0.9828
148.021
Inference
Train
Method
Ours w/o Cat
0.1920
0.9657
162.275
Time(h)
Iteration Memory (GB) FLOPS(G) Latency(s)
Ours w/o Cost
0.2007
0.9749
156.524
IN2N
267
20000
18.32
Ours
0.2089
0.9864
137.740
Ours
3.47
10000
20.92
131
035
Training Loss
L= LKL + Lrecon +LAD-G+LAD-D+ contr
KL loss on Latent Distill.
Reconstruction loss, Adversarial loss, and
Contrastive loss on Translated NeRF.