This image showcases a research poster presented at a conference, likely related to computer vision or graphics. The poster, titled "GenN2N: Generative NeRF2NeRF Translation," features contributions from researchers affiliated with The Hong Kong University of Science and Technology and Tsinghua University.

The poster elaborates on a new generative approach for NeRF (Neural Radiance Fields) to NeRF translation, emphasizing applications in 3D editing and rendering. Key contributions include the development of:

1. A novel generative NeRF-to-NeRF translation framework.
2. A 3D VAE-GAN framework for capturing 3D edit distributions.
3. A contrastive learning technique to separate 3D edits and camera views.
4. Enhanced efficiency, quality, and diversity in NeRF translation tasks.

The methodology section details the network pipeline and the training and inference process, showcasing diagrammatic representations. The poster also highlights various qualitative and quantitative results such as NeRF colorization, super-resolution, inpainting, and text-driven NeRF editing. Tables and images demonstrate the enhanced output quality compared to existing methods, supported by metrics and visual examples.
Text transcribed from the image:
香港科技大學
THE HONG KONG
UNIVERSITY OF SCIENCE
AND TECHNOLOGY
·
GHUA
1911
UNIVER
清華大學
Tsinghua University
Contributions
GenN2N: Generative NeRF2NERF Translation
Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3,4+
HKUST, 2Tsinghua University, 3Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute
Edit
Contrastive Loss
Original Scene
S?
CVP
SEATTLE, WA JUNE 17-21
https://xiangyueliu.github.io/Ge
Qualitative and Quantitative Results
NeRF Colorization
NeRF Super-resolution
Original NeRF
Instruct-NeRF2
Translated
NeRF
C
Render
Original NeRF
Instruct-NeRF2NeRF
PaletteNeRF
posel
pose
posel
Edit
NeRF-SR
Method
Reduce style distance
PSNR 1
SSIM
catt
contr
Ours inference 1
Ours inference 2
Ours inference 3
ReShift (40+NeRF
Instruct-NeRF2NeRF
NeRF-SR [34]
Ours w/o C
19.978
0.535
20.299
0642
27.957
0.897
12.555
0.663
Method
CF1
DDColor [14]+NeRF
40.435
FID
148.957
Ours w/o C
15372
0.662
28.501
0.913
Instruct-NeRF2NeRF
45.599
201.456
PaletteNeRF [16]
39.654
NeRF Inpainting
Ours w/o Lady
35.031
137.740
Ours w/o Contr
34.829
105.750
Ours
65.099
35.041
Original NeRF
Instruct-NeRF
Input
Output 1
Output 2
Output 3
Input
Output 1
Output 2
Output 3
A generative NeRF-to-NeRF translation formulation for the universal NeRF editing
task together with a generic solution.
. A 3D VAE-GAN framework that can learn the distribution of all possible 3D NERF
edits corresponding to the a set of input edited 2D images.
• A contrastive learning framework to disentangle the 3D edits and 2D camera views.
• Superior efficiency, quality, and diversity of the NeRF-to-NeRF translation results.
Method
Training
Image translation
Latent distill
Pose
Z
Text-driven editing
Colorization
Super-resolution
Inpainting
Encoder
MLP
Translated
NeRF
M-1
(S)(S-0
1M-1
C
Crep
contr
Increase style distance
Render
pose i
• Increase the distance between edit codes of
same-view different-edit style images.
• Reduce the distance between edit codes of
different-view same-edit style images.
Conditional Adversarial Loss
Novel view!
Concat
View I with different edit styles
S
s
Concat
Text-driven NeRF Editing
{}0
Inference
Pose
Z
Sampling
Guassian distribution
Network pipeline:
·
.
Translated
NeRF
KL loss
P(zorm)
P(Znorm)log(P(z)
{z}}
Constrain to Guassian distribution
Edited 3D scene
Rendering loss
Crecon = || -||
Lady GAN(CSS)
Lcontr Contrastive (CCS)
Latent distill: extract edit codes from the translated image set, which serve as the input of the
translated NeRF.
Optimizie: a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv
and Lcontr to optimize the appearance and geometry of the translated NeRF.
Inference: sample a latent vector from Gaussian distribution and render a corresponding multi-
view consistent 3D scene with high quality.
.
Fake pair CAD
Real pair
Distinguishe artifacts (e.g. blur, distortion) in novel-
view rendered image compared with target image.
Training Loss
SPin-NeRF
Ours
Method
PSNR 1
SSIM 1
LaMa [31]+NeRF
18.983
0.3706
Original NeRF
Instruct-NeRF2NeRF 1
Instruct-NeRF2NeRF 2
Instruct-NeRF2NeRF
16.734
03088
SPin-NeRF [24]
24.369
0.7217
Ours
26.868
0.8137
Ablation
Ours inference 1
Method
Ours inference 2
CLIP Text-Image
Direction Similarity
Ours inference 3
CLIP Direction
Consistency ↑
CLIP Text-Image
CLIP Direction
M
Direction Similarity
Consistency
FID
M=1
0.2635
0.9610
M-3
0.2807
0.9650
InstructPix2Pix [2]+NeRF
Instruct-NeRF2NeRF
0.1669
0.8475
270.542
M-5
0.2835
0.9638
0.2021
0.9828
148.021
Infere
Train
Method
Ours w/o Cy
0.1920
0.9657
162.275
Reconstruction loss, Adversarial loss, and
Contrastive loss on Translated NeRF.
Ours w/o Cont
0.2007
0.9749
156.524
IN2N
20000
Time(h) Iteration Memory(GB) FLOPS(G)
2.67
Ours
0.2089
0.9864
137.740
Ours
3.47 10000
18.32
20.92
131
L=LKL+Lrecon + LAD-G+LAD-D+Lcontr
KL loss on Latent Distill.