This image showcases a detailed academic poster titled "GenN2N: Generative NeRF2NeRF Translation," produced collaboratively by researchers from The Hong Kong University of Science and Technology and Tsinghua University. Key contributors include Xiangque Liu, Han Xue, Kunming Luo, Ping Tan, and Li Yi.

The poster is organized into several sections:

1. **Contributions**: Highlights the novel aspects of this work including a generative NeRF-to-NeRF translation formulation, a 3D VAE-GAN framework to learn 3D NeRF edits, a contrastive learning framework for disentangling 3D edits and 2D views, and claims of superior efficiency, quality, and diversity.

2. **Method**: Describes the workflow for training and inference, involving an image translation process, latent detail extraction, and optimization processes like KL loss and Lcont, to ensure high-quality 3D scene rendering from Gaussian-distributed latent vectors.

3. **Contrastive Loss**: Explains how the model measures the distance between codes of different edited styles and same-edit styles to separate 3D edits from 2D views effectively.

4. **Conditional Adversarial Loss**: Details the adversarial training mechanism that incorporates different edit styles to minimize discrepancies between generated and real images.

5. **Training Loss**: Offers an overview of the various loss components aggregated to optimize model training, including content loss, reconstruction loss, and adversarial losses.

6. **Qualitative and Quantitative Results**: Displays visual and statistical comparisons of the model's performance across several tasks like NeRF colorization, super-resolution, inpainting, and text-driven NeRF editing, showcasing the model’s effectiveness in various scenarios.

Images, charts, and numerical results exemplify how the model applies to different editing tasks, confirming its broad applicability and effectiveness. The poster serves as a comprehensive visual summary of advanced techniques in 3D NeRF editing and translation.
Text transcribed from the image:
香港科技大學
THE HONG KONG
UNIVERSITY OF SCIENCE
AND TECHNOLOGY
·
GHUA
1911
UNIVER
清華大學
Tsinghua University
Contributions
GenN2N: Generative NeRF2NERF Translation
Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3,4+
HKUST, 2Tsinghua University, 3Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute
Edit
Contrastive Loss
Original Scene
S?
CVP
SEATTLE, WA JUNE 17-21
https://xiangyueliu.github.io/Ge
Qualitative and Quantitative Results
NeRF Colorization
NeRF Super-resolution
Original NeRF
Instruct-NeRF2
Translated
NeRF
C
Render
Original NeRF
Instruct-NeRF2NeRF
PaletteNeRF
posel
pose
posel
Edit
NeRF-SR
Method
Reduce style distance
PSNR 1
SSIM
catt
contr
Ours inference 1
Ours inference 2
Ours inference 3
ReShift (40+NeRF
Instruct-NeRF2NeRF
NeRF-SR [34]
Ours w/o C
19.978
0.535
20.299
0642
27.957
0.897
12.555
0.663
Method
CF1
DDColor [14]+NeRF
40.435
FID
148.957
Ours w/o C
15372
0.662
28.501
0.913
Instruct-NeRF2NeRF
45.599
201.456
PaletteNeRF [16]
39.654
NeRF Inpainting
Ours w/o Lady
35.031
137.740
Ours w/o Contr
34.829
105.750
Ours
65.099
35.041
Original NeRF
Instruct-NeRF
Input
Output 1
Output 2
Output 3
Input
Output 1
Output 2
Output 3
A generative NeRF-to-NeRF translation formulation for the universal NeRF editing
task together with a generic solution.
. A 3D VAE-GAN framework that can learn the distribution of all possible 3D NERF
edits corresponding to the a set of input edited 2D images.
• A contrastive learning framework to disentangle the 3D edits and 2D camera views.
• Superior efficiency, quality, and diversity of the NeRF-to-NeRF translation results.
Method
Training
Image translation
Latent distill
Pose
Z
Text-driven editing
Colorization
Super-resolution
Inpainting
Encoder
MLP
Translated
NeRF
M-1
(S)(S-0
1M-1
C
Crep
contr
Increase style distance
Render
pose i
• Increase the distance between edit codes of
same-view different-edit style images.
• Reduce the distance between edit codes of
different-view same-edit style images.
Conditional Adversarial Loss
Novel view!
Concat
View I with different edit styles
S
s
Concat
Text-driven NeRF Editing
{}
Inference
Pose
Z
Sampling
Guassian distribution
Network pipeline:
·
.
Translated
NeRF
KL loss
P(zorm)
P(Znorm)log(P(z)
{z}}
Constrain to Guassian distribution
Edited 3D scene
Rendering loss
Crecon = || -||
Lady GAN(CSS)
Lcontr Contrastive (CCS)
Latent distill: extract edit codes from the translated image set, which serve as the input of the
translated NeRF.
Optimizie: a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv
and Lcontr to optimize the appearance and geometry of the translated NeRF.
Inference: sample a latent vector from Gaussian distribution and render a corresponding multi-
view consistent 3D scene with high quality.
.
Fake pair CAD
Real pair
Distinguishe artifacts (e.g. blur, distortion) in novel-
view rendered image compared with target image.
Training Loss
SPin-NeRF
Ours
Method
PSNR 1
SSIM 1
LaMa [31]+NeRF
18.983
0.3706
Original NeRF
Instruct-NeRF2NeRF 1
Instruct-NeRF2NeRF 2
Instruct-NeRF2NeRF
16.734
03088
SPin-NeRF [24]
24.369
0.7217
Ours
26.868
0.8137
Ablation
Ours inference 1
Method
Ours inference 2
CLIP Text-Image
Direction Similarity
Ours inference 3
CLIP Direction
Consistency ↑
CLIP Text-Image
CLIP Direction
M
Direction Similarity
Consistency
FID
M=1
0.2635
0.9610
M-3
0.2807
0.9650
InstructPix2Pix [2]+NeRF
Instruct-NeRF2NeRF
0.1669
0.8475
270.542
M-5
0.2835
0.9638
0.2021
0.9828
148.021
Infere
Train
Method
Ours w/o Cy
0.1920
0.9657
162.275
Reconstruction loss, Adversarial loss, and
Contrastive loss on Translated NeRF.
Ours w/o Cont
0.2007
0.9749
156.524
IN2N
20000
Time(h) Iteration Memory(GB) FLOPS(G)
2.67
Ours
0.2089
0.9864
137.740
Ours
3.47 10000
18.32
20.92
131
L=LKL+Lrecon + LAD-G+LAD-D+Lcontr
KL loss on Latent Distill.