The poster, presented by researchers from the Hong Kong University of Science and Technology and Tsinghua University, showcases the project "GenN2N: Generative NeRF2NeRF Translation", developed by Xiangyue Liu, Han Xue, Kunming Luo, and Ping Tan. This study proposes a novel generative NeRF-to-NeRF translation framework that allows for universal NeRF editing through a conditional adversarial network. **Contributions:** - Demonstration of various 3D NeRF edited outputs, showing transformed facial expressions and object styles. - Introduction of a 3D space learning framework to disentangle NeRF edits and 2D camera views, enhancing efficiency, quality, and diversity. **Method:** - Detailed depiction of the framework involving an image translation component, latent disentanglement, and conditional adversarial loss to handle diverse NeRF editing tasks. - A schematic explanation of how inputs are encoded, mapped through latent vectors, and processed to generate edited NeRF scenes. **Loss Functions:** - Contrastive Loss: Aimed at distinguishing the differences between edit codes and ensuring visual consistency in output. - Conditional Adversarial Loss: Targets the generation of realistic edits across different styles. - Training Loss: Combines KL Divergence to regularize the latent space, ensuring high-quality translations and efficient learning. **Qualitative and Quantitative Results:** - NeRF Colorization: Comparison showing improved coloring techniques using the proposed method. - NeRF Super-resolution: Enhanced details in rendered images. - NeRF Inpainting: Effective filling in of missing parts in rendered scenes, showcasing the robustness of the method. - Text-driven NeRF Editing: Demonstration of NeRF scene alterations driven by textual inputs. **Ablation Studies:** - Analysis showing the impact of different components and techniques on the performance of NeRF editing. This poster was presented at CVPR, attracting attention for its innovative approach to neural radiance field translation and editing. Text transcribed from the image: 香港科技大學 HE HONG KONG NIVERSITY OF SCIENCE ND TECHNOLOGY 大發 4 1911 UNIVE 清華大學 Tsinghua University Contributions GenN2N: Generative NeRF2NERF Translation Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3.4+ HKUST, 2Tsinghua University, ³ Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute Contrastive Loss Original Scene Edit CVPR IM SEATTLE, WA JUNE 17-21, 2024 https://xiangyueliu.github.io/GenN2N/ 2 Qualitative and Quantitative Results NeRF Colorization NERF Super-resolution Output 1 Output 2 Output 3 Input Output 1 Output 2 Output 3 pose i posel Edit S Original NeRF Instruct-NeRF2NERF Translated NeRF Render Original NeRF Instruct-NeRF2NeRF PaletteNeRF pose! Render pose i NeRF-SR Our Method PSNR 1 SSIM LAPS C Reduce style distance ResShin (40+NeRF 19.978 0535 01156 catt contr Ours inference I Instruct-NeRF2NeRF 20299 0647 02732 Ours inference 2 Ours inference 3 NeRF-SR [4] 27.957 0.897 0.0997 Outs w/o C 12.555 0.663 02001 Method CF ↑ FID Ours w/o Ours 15372 0662 0.1834 28.501 0.913 0.074 DDColor [14]+NeRF 40.435 148.957 Instruct-NeRF2NeRF 45.599 201.456 NeRF Inpainting PaletteNeRF [16] 39.654 Ours w/o Cady 35.031 137.740 Ours w/o contr 34.829 105.750 Ours 65.099 35.041 Original NeRF Instruct-NeRF2NeRF tive NeRF-to-NeRF translation formulation for the universal NeRF editing her with a generic solution. E-GAN framework that can learn the distribution of all possible 3D NERF esponding to the a set of input edited 2D images. stive learning framework to disentangle the 3D edits and 2D camera views. efficiency, quality, and diversity of the NeRF-to-NeRF translation results. Crep contr Increase style distance Increase the distance between edit codes of same-view different-edit style images. Reduce the distance between edit codes of different-view same-edit style images. Conditional Adversarial Loss View I with different edit styles Novel view! Method Image translation Latent distill Pose Text-driven editing Colorization Super-resolution Inpainting Encoder MLP Translated NeRF (S)-(S-1) 3M-1 Pose Z KL loss P(Znorm)logP(z) Plznorm) {z}} Constrain to Guassian distribution Edited 3D scene peline: Translated NeRF Rendering loss recon = ||c-s Lady = GAN(CSS) Ccontr = Contrastive(CCS) ll: extract edit codes from the translated image set, which serve as the input of the NeRF. a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv to optimize the appearance and geometry of the translated NeRF. sample a latent vector from Gaussian distribution and render a corresponding multi- istent 3D scene with high quality. . Concat Concat Text-driven NeRF Editing SPin-NeRF Ours Method PSNR 1 SSIM LMPS Original NeRF Fake pair LAD Distinguishe artifacts (e.g. blur, distortion) in novel- view rendered image compared with target image. Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2 LaMa B1+NeRF 18.983 0.3706 0.1730 Real pair Instruct-NeRFINERF 16.734 0.3008 02750 SPin-NeRF (241 24.369 0.7217 0.1754 Ours 26.868 0.8137 01284 Ablation Ours inference 1 Method Ours inference 2 CLIP Text-Image Direction Similarity Ours inference 3 CLIP Text-Image CLIP Direction M FID Direction Similarity Consistency 1 CLIP Direction Consistency ↑ FID M=1 0.2635 0.9610 123.505 M-3 0.2807 09650 91823 InstructPix2Pix [2]+NeRF Instruct-NeRF2NeRF 0.1669 0.8475 270.542 M-5 0.2835 0.9638 86.377 0.2021 0.9828 148.021 Inference Train Method Ours w/o Cat 0.1920 0.9657 162.275 Time(h) Iteration Memory (GB) FLOPS(G) Latency(s) Ours w/o Cost 0.2007 0.9749 156.524 IN2N 267 20000 18.32 Ours 0.2089 0.9864 137.740 Ours 3.47 10000 20.92 131 035 Training Loss L= LKL + Lrecon +LAD-G+LAD-D+ contr KL loss on Latent Distill. Reconstruction loss, Adversarial loss, and Contrastive loss on Translated NeRF.