This detailed poster, titled "GenN2N: Generative NeRF2NeRF Translation," presents a sophisticated solution for universal NeRF (Neural Radiance Fields) editing, contributed by researchers Xiangyue Liu, Han Xue, Kunming Luo, Ping Tan, and Li Yi from multiple institutes including HKUST, Tsinghua University, Shanghai AI Laboratory, and Shanghai Qi Zhi Institute. **Contributions:** The study introduces a novel NeRF-to-NeRF translation formulation to enhance NeRF editing and proposes a sophisticated generative adversarial network (GAN) framework to learn the distribution of all possible 3D NeRF models from a set of input edited 2D images. A novel learning framework is also proposed to disentangle 3D edits and 2D camera views, significantly improving the efficiency, quality, and diversity of NeRF-to-NeRF translation results. **Method:** The methodology incorporates image translation using an encoder, MLP, and latent diffusion processes. The pipeline involves extracting edit codes from translated images, followed by optimization to ensure high-quality 3D scene generation. Key techniques include latent diffusion, rendering loss, and constraints to Gaussian distribution. **Loss Functions:** The framework employs several loss functions including contrastive loss to distinguish different edit styles, conditional adversarial loss to maintain consistency in novel view synthesis, and training loss optimized with various constraints. **Qualitative and Quantitative Results:** The results showcase a wide array of applications: - **NeRF Colorization**: Transforming grayscale NeRFs into colorized versions. - **NeRF Super-resolution**: Enhancing the resolution of NeRF models. - **NeRF Inpainting**: Filling in missing parts of NeRF models. - **Text-driven NeRF Editing**: Modifying NeRF scenes based on textual descriptions. The performance is thoroughly quantified, demonstrating superiority in terms of standard metrics over existing methods. The poster highlights the authors’ contribution to CVPR 2023, offering a significant leap forward in NeRF editing technologies. Text transcribed from the image: 香港科技大學 HE HONG KONG NIVERSITY OF SCIENCE ND TECHNOLOGY 大發 4 1911 UNIVE 清華大學 Tsinghua University Contributions GenN2N: Generative NeRF2NERF Translation Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3.4+ HKUST, 2Tsinghua University, ³ Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute Contrastive Loss Original Scene Edit CVPR IM SEATTLE, WA JUNE 17-21, 2024 https://xiangyueliu.github.io/GenN2N/ 2 Qualitative and Quantitative Results NeRF Colorization NERF Super-resolution Output 1 Output 2 Output 3 Input Output 1 Output 2 Output 3 pose i posel Edit S Original NeRF Instruct-NeRF2NERF Translated NeRF Render Original NeRF Instruct-NeRF2NeRF PaletteNeRF pose! Render pose i NeRF-SR Our Method PSNR 1 SSIM LAPS C Reduce style distance ResShin (40+NeRF 19.978 0535 01156 catt contr Ours inference I Instruct-NeRF2NeRF 20299 0647 02732 Ours inference 2 Ours inference 3 NeRF-SR [4] 27.957 0.897 0.0997 Outs w/o C 12.555 0.663 02001 Method CF ↑ FID Ours w/o Ours 15372 0662 0.1834 28.501 0.913 0.074 DDColor [14]+NeRF 40.435 148.957 Instruct-NeRF2NeRF 45.599 201.456 NeRF Inpainting PaletteNeRF [16] 39.654 Ours w/o Cady 35.031 137.740 Ours w/o contr 34.829 105.750 Ours 65.099 35.041 Original NeRF Instruct-NeRF2NeRF tive NeRF-to-NeRF translation formulation for the universal NeRF editing her with a generic solution. E-GAN framework that can learn the distribution of all possible 3D NERF esponding to the a set of input edited 2D images. stive learning framework to disentangle the 3D edits and 2D camera views. efficiency, quality, and diversity of the NeRF-to-NeRF translation results. Crep contr Increase style distance Increase the distance between edit codes of same-view different-edit style images. Reduce the distance between edit codes of different-view same-edit style images. Conditional Adversarial Loss View I with different edit styles Novel view! Method Image translation Latent distill Pose Text-driven editing Colorization Super-resolution Inpainting Encoder MLP Translated NeRF (S)-(S-1) 3M-1 Pose Z KL loss P(Znorm)logP(z) Plznorm) {z}} Constrain to Guassian distribution Edited 3D scene peline: Translated NeRF Rendering loss recon = ||c-s Lady = GAN(CSS) Ccontr = Contrastive(CCS) ll: extract edit codes from the translated image set, which serve as the input of the NeRF. a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv to optimize the appearance and geometry of the translated NeRF. sample a latent vector from Gaussian distribution and render a corresponding multi- istent 3D scene with high quality. . Concat Concat Text-driven NeRF Editing SPin-NeRF Ours Method PSNR 1 SSIM LMPS Original NeRF Fake pair LAD Distinguishe artifacts (e.g. blur, distortion) in novel- view rendered image compared with target image. Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2 LaMa B1+NeRF 18.983 0.3706 0.1730 Real pair Instruct-NeRFINERF 16.734 0.3008 02750 SPin-NeRF (241 24.369 0.7217 0.1754 Ours 26.868 0.8137 01284 Ablation Ours inference 1 Method Ours inference 2 CLIP Text-Image Direction Similarity Ours inference 3 CLIP Text-Image CLIP Direction M FID Direction Similarity Consistency 1 CLIP Direction Consistency ↑ FID M=1 0.2635 0.9610 123.505 M-3 0.2807 09650 91823 InstructPix2Pix [2]+NeRF Instruct-NeRF2NeRF 0.1669 0.8475 270.542 M-5 0.2835 0.9638 86.377 0.2021 0.9828 148.021 Inference Train Method Ours w/o Cat 0.1920 0.9657 162.275 Time(h) Iteration Memory (GB) FLOPS(G) Latency(s) Ours w/o Cost 0.2007 0.9749 156.524 IN2N 267 20000 18.32 Ours 0.2089 0.9864 137.740 Ours 3.47 10000 20.92 131 035 Training Loss L= LKL + Lrecon +LAD-G+LAD-D+ contr KL loss on Latent Distill. Reconstruction loss, Adversarial loss, and Contrastive loss on Translated NeRF.