The image depicts a large computer monitor displaying a variety of content. The monitor is set up on a table in a public space, likely a conference room or meeting area. The screen is surrounded by graphs and charts, which create a visually engaging and informative environment. There are several images on the screen, including pictures, diagrams, and text, suggesting that the monitor is used for presentations, trainings, or research purposes. The background of the image appears to be a workspace with people present, as there are additional monitors and equipment visible in the vicinity. Overall, the scene conveys a sense of professionalism and productivity. Text transcribed from the image: 香港科技大學 HE HONG KONG NIVERSITY OF SCIENCE ND TECHNOLOGY 大發 4 1911 UNIVE 清華大學 Tsinghua University Contributions GenN2N: Generative NeRF2NERF Translation Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3.4+ HKUST, 2Tsinghua University, ³ Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute Contrastive Loss Original Scene Edit CVPR IM SEATTLE, WA JUNE 17-21, 2024 https://xiangyueliu.github.io/GenN2N/ 2 Qualitative and Quantitative Results NeRF Colorization NERF Super-resolution Output 1 Output 2 Output 3 Input Output 1 Output 2 Output 3 pose i posel Edit S Original NeRF Instruct-NeRF2NERF Translated NeRF Render Original NeRF Instruct-NeRF2NeRF PaletteNeRF pose! Render pose i NeRF-SR Our Method PSNR 1 SSIM LAPS C Reduce style distance ResShin (40+NeRF 19.978 0535 01156 catt contr Ours inference I Instruct-NeRF2NeRF 20299 0647 02732 Ours inference 2 Ours inference 3 NeRF-SR [4] 27.957 0.897 0.0997 Outs w/o C 12.555 0.663 02001 Method CF ↑ FID Ours w/o Ours 15372 0662 0.1834 28.501 0.913 0.074 DDColor [14]+NeRF 40.435 148.957 Instruct-NeRF2NeRF 45.599 201.456 NeRF Inpainting PaletteNeRF [16] 39.654 Ours w/o Cady 35.031 137.740 Ours w/o contr 34.829 105.750 Ours 65.099 35.041 Original NeRF Instruct-NeRF2NeRF tive NeRF-to-NeRF translation formulation for the universal NeRF editing her with a generic solution. E-GAN framework that can learn the distribution of all possible 3D NERF esponding to the a set of input edited 2D images. stive learning framework to disentangle the 3D edits and 2D camera views. efficiency, quality, and diversity of the NeRF-to-NeRF translation results. Crep contr Increase style distance Increase the distance between edit codes of same-view different-edit style images. Reduce the distance between edit codes of different-view same-edit style images. Conditional Adversarial Loss View I with different edit styles Novel view! Method Image translation Latent distill Pose Text-driven editing Colorization Super-resolution Inpainting Encoder MLP Translated NeRF (S)-(S-1) 3M-1 Pose Z KL loss P(Znorm)logP(z) Plznorm) {z}} Constrain to Guassian distribution Edited 3D scene peline: Translated NeRF Rendering loss recon = ||c-s Lady = GAN(CSS) Ccontr = Contrastive(CCS) ll: extract edit codes from the translated image set, which serve as the input of the NeRF. a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv to optimize the appearance and geometry of the translated NeRF. sample a latent vector from Gaussian distribution and render a corresponding multi- istent 3D scene with high quality. . Concat Concat Text-driven NeRF Editing SPin-NeRF Ours Method PSNR 1 SSIM LMPS Original NeRF Fake pair LAD Distinguishe artifacts (e.g. blur, distortion) in novel- view rendered image compared with target image. Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2 LaMa B1+NeRF 18.983 0.3706 0.1730 Real pair Instruct-NeRFINERF 16.734 0.3008 02750 SPin-NeRF (241 24.369 0.7217 0.1754 Ours 26.868 0.8137 01284 Ablation Ours inference 1 Method Ours inference 2 CLIP Text-Image Direction Similarity Ours inference 3 CLIP Text-Image CLIP Direction M FID Direction Similarity Consistency 1 CLIP Direction Consistency ↑ FID M=1 0.2635 0.9610 123.505 M-3 0.2807 09650 91823 InstructPix2Pix [2]+NeRF Instruct-NeRF2NeRF 0.1669 0.8475 270.542 M-5 0.2835 0.9638 86.377 0.2021 0.9828 148.021 Inference Train Method Ours w/o Cat 0.1920 0.9657 162.275 Time(h) Iteration Memory (GB) FLOPS(G) Latency(s) Ours w/o Cost 0.2007 0.9749 156.524 IN2N 267 20000 18.32 Ours 0.2089 0.9864 137.740 Ours 3.47 10000 20.92 131 035 Training Loss L= LKL + Lrecon +LAD-G+LAD-D+ contr KL loss on Latent Distill. Reconstruction loss, Adversarial loss, and Contrastive loss on Translated NeRF.