This image showcases a detailed academic poster titled "GenN2N: Generative NeRF2NeRF Translation," produced collaboratively by researchers from The Hong Kong University of Science and Technology and Tsinghua University. Key contributors include Xiangque Liu, Han Xue, Kunming Luo, Ping Tan, and Li Yi. The poster is organized into several sections: 1. **Contributions**: Highlights the novel aspects of this work including a generative NeRF-to-NeRF translation formulation, a 3D VAE-GAN framework to learn 3D NeRF edits, a contrastive learning framework for disentangling 3D edits and 2D views, and claims of superior efficiency, quality, and diversity. 2. **Method**: Describes the workflow for training and inference, involving an image translation process, latent detail extraction, and optimization processes like KL loss and Lcont, to ensure high-quality 3D scene rendering from Gaussian-distributed latent vectors. 3. **Contrastive Loss**: Explains how the model measures the distance between codes of different edited styles and same-edit styles to separate 3D edits from 2D views effectively. 4. **Conditional Adversarial Loss**: Details the adversarial training mechanism that incorporates different edit styles to minimize discrepancies between generated and real images. 5. **Training Loss**: Offers an overview of the various loss components aggregated to optimize model training, including content loss, reconstruction loss, and adversarial losses. 6. **Qualitative and Quantitative Results**: Displays visual and statistical comparisons of the model's performance across several tasks like NeRF colorization, super-resolution, inpainting, and text-driven NeRF editing, showcasing the model’s effectiveness in various scenarios. Images, charts, and numerical results exemplify how the model applies to different editing tasks, confirming its broad applicability and effectiveness. The poster serves as a comprehensive visual summary of advanced techniques in 3D NeRF editing and translation. Text transcribed from the image: 香港科技大學 THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY · GHUA 1911 UNIVER 清華大學 Tsinghua University Contributions GenN2N: Generative NeRF2NERF Translation Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3,4+ HKUST, 2Tsinghua University, 3Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute Edit Contrastive Loss Original Scene S? CVP SEATTLE, WA JUNE 17-21 https://xiangyueliu.github.io/Ge Qualitative and Quantitative Results NeRF Colorization NeRF Super-resolution Original NeRF Instruct-NeRF2 Translated NeRF C Render Original NeRF Instruct-NeRF2NeRF PaletteNeRF posel pose posel Edit NeRF-SR Method Reduce style distance PSNR 1 SSIM catt contr Ours inference 1 Ours inference 2 Ours inference 3 ReShift (40+NeRF Instruct-NeRF2NeRF NeRF-SR [34] Ours w/o C 19.978 0.535 20.299 0642 27.957 0.897 12.555 0.663 Method CF1 DDColor [14]+NeRF 40.435 FID 148.957 Ours w/o C 15372 0.662 28.501 0.913 Instruct-NeRF2NeRF 45.599 201.456 PaletteNeRF [16] 39.654 NeRF Inpainting Ours w/o Lady 35.031 137.740 Ours w/o Contr 34.829 105.750 Ours 65.099 35.041 Original NeRF Instruct-NeRF Input Output 1 Output 2 Output 3 Input Output 1 Output 2 Output 3 A generative NeRF-to-NeRF translation formulation for the universal NeRF editing task together with a generic solution. . A 3D VAE-GAN framework that can learn the distribution of all possible 3D NERF edits corresponding to the a set of input edited 2D images. • A contrastive learning framework to disentangle the 3D edits and 2D camera views. • Superior efficiency, quality, and diversity of the NeRF-to-NeRF translation results. Method Training Image translation Latent distill Pose Z Text-driven editing Colorization Super-resolution Inpainting Encoder MLP Translated NeRF M-1 (S)(S-0 1M-1 C Crep contr Increase style distance Render pose i • Increase the distance between edit codes of same-view different-edit style images. • Reduce the distance between edit codes of different-view same-edit style images. Conditional Adversarial Loss Novel view! Concat View I with different edit styles S s Concat Text-driven NeRF Editing {} Inference Pose Z Sampling Guassian distribution Network pipeline: · . Translated NeRF KL loss P(zorm) P(Znorm)log(P(z) {z}} Constrain to Guassian distribution Edited 3D scene Rendering loss Crecon = || -|| Lady GAN(CSS) Lcontr Contrastive (CCS) Latent distill: extract edit codes from the translated image set, which serve as the input of the translated NeRF. Optimizie: a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv and Lcontr to optimize the appearance and geometry of the translated NeRF. Inference: sample a latent vector from Gaussian distribution and render a corresponding multi- view consistent 3D scene with high quality. . Fake pair CAD Real pair Distinguishe artifacts (e.g. blur, distortion) in novel- view rendered image compared with target image. Training Loss SPin-NeRF Ours Method PSNR 1 SSIM 1 LaMa [31]+NeRF 18.983 0.3706 Original NeRF Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2 Instruct-NeRF2NeRF 16.734 03088 SPin-NeRF [24] 24.369 0.7217 Ours 26.868 0.8137 Ablation Ours inference 1 Method Ours inference 2 CLIP Text-Image Direction Similarity Ours inference 3 CLIP Direction Consistency ↑ CLIP Text-Image CLIP Direction M Direction Similarity Consistency FID M=1 0.2635 0.9610 M-3 0.2807 0.9650 InstructPix2Pix [2]+NeRF Instruct-NeRF2NeRF 0.1669 0.8475 270.542 M-5 0.2835 0.9638 0.2021 0.9828 148.021 Infere Train Method Ours w/o Cy 0.1920 0.9657 162.275 Reconstruction loss, Adversarial loss, and Contrastive loss on Translated NeRF. Ours w/o Cont 0.2007 0.9749 156.524 IN2N 20000 Time(h) Iteration Memory(GB) FLOPS(G) 2.67 Ours 0.2089 0.9864 137.740 Ours 3.47 10000 18.32 20.92 131 L=LKL+Lrecon + LAD-G+LAD-D+Lcontr KL loss on Latent Distill.