The image depicts a large computer monitor displaying a variety of content. The monitor is set up on a table in a public space, likely a conference room or meeting area. The screen is surrounded by graphs and charts, which create a visually engaging and informative environment. There are several images on the screen, including pictures, diagrams, and text, suggesting that the monitor is used for presentations, trainings, or research purposes. The background of the image appears to be a workspace with people present, as there are additional monitors and equipment visible in the vicinity. Overall, the scene conveys a sense of professionalism and productivity.
Text transcribed from the image:
香港科技大學
HE HONG KONG
NIVERSITY OF SCIENCE
ND TECHNOLOGY
大發
4
1911
UNIVE
清華大學
Tsinghua University
Contributions
GenN2N: Generative NeRF2NERF Translation
Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3.4+
HKUST, 2Tsinghua University, ³ Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute
Contrastive Loss
Original Scene
Edit
CVPR
IM SEATTLE, WA JUNE 17-21, 2024
https://xiangyueliu.github.io/GenN2N/
2
Qualitative and Quantitative Results
NeRF Colorization
NERF Super-resolution
Output 1
Output 2
Output 3
Input
Output 1
Output 2
Output 3
pose i
posel
Edit
S
Original NeRF
Instruct-NeRF2NERF
Translated
NeRF
Render
Original NeRF
Instruct-NeRF2NeRF
PaletteNeRF
pose!
Render
pose i
NeRF-SR
Our
Method
PSNR 1
SSIM
LAPS
C
Reduce style distance
ResShin (40+NeRF
19.978
0535
01156
catt
contr
Ours inference I
Instruct-NeRF2NeRF
20299
0647
02732
Ours inference 2
Ours inference 3
NeRF-SR [4]
27.957
0.897
0.0997
Outs w/o C
12.555
0.663
02001
Method
CF ↑
FID
Ours w/o
Ours
15372
0662
0.1834
28.501
0.913
0.074
DDColor [14]+NeRF
40.435
148.957
Instruct-NeRF2NeRF
45.599
201.456
NeRF Inpainting
PaletteNeRF [16]
39.654
Ours w/o Cady
35.031
137.740
Ours w/o contr
34.829
105.750
Ours
65.099
35.041
Original NeRF
Instruct-NeRF2NeRF
tive NeRF-to-NeRF translation formulation for the universal NeRF editing
her with a generic solution.
E-GAN framework that can learn the distribution of all possible 3D NERF
esponding to the a set of input edited 2D images.
stive learning framework to disentangle the 3D edits and 2D camera views.
efficiency, quality, and diversity of the NeRF-to-NeRF translation results.
Crep
contr
Increase style distance
Increase the distance between edit codes of
same-view different-edit style images.
Reduce the distance between edit codes of
different-view same-edit style images.
Conditional Adversarial Loss
View I with different edit styles
Novel view!
Method
Image translation
Latent distill
Pose
Text-driven editing
Colorization
Super-resolution
Inpainting
Encoder MLP
Translated
NeRF
(S)-(S-1)
3M-1
Pose
Z
KL loss
P(Znorm)logP(z)
Plznorm)
{z}}
Constrain to Guassian distribution
Edited 3D scene
peline:
Translated
NeRF
Rendering loss
recon = ||c-s
Lady = GAN(CSS)
Ccontr = Contrastive(CCS)
ll: extract edit codes from the translated image set, which serve as the input of the
NeRF.
a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv
to optimize the appearance and geometry of the translated NeRF.
sample a latent vector from Gaussian distribution and render a corresponding multi-
istent 3D scene with high quality.
.
Concat
Concat
Text-driven NeRF Editing
SPin-NeRF
Ours
Method
PSNR 1
SSIM
LMPS
Original NeRF
Fake pair LAD
Distinguishe artifacts (e.g. blur, distortion) in novel-
view rendered image compared with target image.
Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2
LaMa B1+NeRF
18.983
0.3706
0.1730
Real pair
Instruct-NeRFINERF
16.734
0.3008
02750
SPin-NeRF (241
24.369
0.7217
0.1754
Ours
26.868
0.8137
01284
Ablation
Ours inference 1
Method
Ours inference 2
CLIP Text-Image
Direction Similarity
Ours inference 3
CLIP Text-Image
CLIP Direction
M
FID
Direction Similarity
Consistency 1
CLIP Direction
Consistency ↑
FID
M=1
0.2635
0.9610
123.505
M-3
0.2807
09650
91823
InstructPix2Pix [2]+NeRF
Instruct-NeRF2NeRF
0.1669
0.8475
270.542
M-5
0.2835
0.9638
86.377
0.2021
0.9828
148.021
Inference
Train
Method
Ours w/o Cat
0.1920
0.9657
162.275
Time(h)
Iteration Memory (GB) FLOPS(G) Latency(s)
Ours w/o Cost
0.2007
0.9749
156.524
IN2N
267
20000
18.32
Ours
0.2089
0.9864
137.740
Ours
3.47
10000
20.92
131
035
Training Loss
L= LKL + Lrecon +LAD-G+LAD-D+ contr
KL loss on Latent Distill.
Reconstruction loss, Adversarial loss, and
Contrastive loss on Translated NeRF.