This image showcases a research poster presented at a conference, likely related to computer vision or graphics. The poster, titled "GenN2N: Generative NeRF2NeRF Translation," features contributions from researchers affiliated with The Hong Kong University of Science and Technology and Tsinghua University. The poster elaborates on a new generative approach for NeRF (Neural Radiance Fields) to NeRF translation, emphasizing applications in 3D editing and rendering. Key contributions include the development of: 1. A novel generative NeRF-to-NeRF translation framework. 2. A 3D VAE-GAN framework for capturing 3D edit distributions. 3. A contrastive learning technique to separate 3D edits and camera views. 4. Enhanced efficiency, quality, and diversity in NeRF translation tasks. The methodology section details the network pipeline and the training and inference process, showcasing diagrammatic representations. The poster also highlights various qualitative and quantitative results such as NeRF colorization, super-resolution, inpainting, and text-driven NeRF editing. Tables and images demonstrate the enhanced output quality compared to existing methods, supported by metrics and visual examples. Text transcribed from the image: 香港科技大學 THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY · GHUA 1911 UNIVER 清華大學 Tsinghua University Contributions GenN2N: Generative NeRF2NERF Translation Xiangyue Liu, Han Xue², Kunming Luo, Ping Tan't, Li Yi2.3,4+ HKUST, 2Tsinghua University, 3Shanghai Al Laboratory, 4Shanghai Qi Zhi Institute Edit Contrastive Loss Original Scene S? CVP SEATTLE, WA JUNE 17-21 https://xiangyueliu.github.io/Ge Qualitative and Quantitative Results NeRF Colorization NeRF Super-resolution Original NeRF Instruct-NeRF2 Translated NeRF C Render Original NeRF Instruct-NeRF2NeRF PaletteNeRF posel pose posel Edit NeRF-SR Method Reduce style distance PSNR 1 SSIM catt contr Ours inference 1 Ours inference 2 Ours inference 3 ReShift (40+NeRF Instruct-NeRF2NeRF NeRF-SR [34] Ours w/o C 19.978 0.535 20.299 0642 27.957 0.897 12.555 0.663 Method CF1 DDColor [14]+NeRF 40.435 FID 148.957 Ours w/o C 15372 0.662 28.501 0.913 Instruct-NeRF2NeRF 45.599 201.456 PaletteNeRF [16] 39.654 NeRF Inpainting Ours w/o Lady 35.031 137.740 Ours w/o Contr 34.829 105.750 Ours 65.099 35.041 Original NeRF Instruct-NeRF Input Output 1 Output 2 Output 3 Input Output 1 Output 2 Output 3 A generative NeRF-to-NeRF translation formulation for the universal NeRF editing task together with a generic solution. . A 3D VAE-GAN framework that can learn the distribution of all possible 3D NERF edits corresponding to the a set of input edited 2D images. • A contrastive learning framework to disentangle the 3D edits and 2D camera views. • Superior efficiency, quality, and diversity of the NeRF-to-NeRF translation results. Method Training Image translation Latent distill Pose Z Text-driven editing Colorization Super-resolution Inpainting Encoder MLP Translated NeRF M-1 (S)(S-0 1M-1 C Crep contr Increase style distance Render pose i • Increase the distance between edit codes of same-view different-edit style images. • Reduce the distance between edit codes of different-view same-edit style images. Conditional Adversarial Loss Novel view! Concat View I with different edit styles S s Concat Text-driven NeRF Editing {}0 Inference Pose Z Sampling Guassian distribution Network pipeline: · . Translated NeRF KL loss P(zorm) P(Znorm)log(P(z) {z}} Constrain to Guassian distribution Edited 3D scene Rendering loss Crecon = || -|| Lady GAN(CSS) Lcontr Contrastive (CCS) Latent distill: extract edit codes from the translated image set, which serve as the input of the translated NeRF. Optimizie: a KL loss to constrain the latent vectors to a Gaussian distribution; and Lrecon, Ladv and Lcontr to optimize the appearance and geometry of the translated NeRF. Inference: sample a latent vector from Gaussian distribution and render a corresponding multi- view consistent 3D scene with high quality. . Fake pair CAD Real pair Distinguishe artifacts (e.g. blur, distortion) in novel- view rendered image compared with target image. Training Loss SPin-NeRF Ours Method PSNR 1 SSIM 1 LaMa [31]+NeRF 18.983 0.3706 Original NeRF Instruct-NeRF2NeRF 1 Instruct-NeRF2NeRF 2 Instruct-NeRF2NeRF 16.734 03088 SPin-NeRF [24] 24.369 0.7217 Ours 26.868 0.8137 Ablation Ours inference 1 Method Ours inference 2 CLIP Text-Image Direction Similarity Ours inference 3 CLIP Direction Consistency ↑ CLIP Text-Image CLIP Direction M Direction Similarity Consistency FID M=1 0.2635 0.9610 M-3 0.2807 0.9650 InstructPix2Pix [2]+NeRF Instruct-NeRF2NeRF 0.1669 0.8475 270.542 M-5 0.2835 0.9638 0.2021 0.9828 148.021 Infere Train Method Ours w/o Cy 0.1920 0.9657 162.275 Reconstruction loss, Adversarial loss, and Contrastive loss on Translated NeRF. Ours w/o Cont 0.2007 0.9749 156.524 IN2N 20000 Time(h) Iteration Memory(GB) FLOPS(G) 2.67 Ours 0.2089 0.9864 137.740 Ours 3.47 10000 18.32 20.92 131 L=LKL+Lrecon + LAD-G+LAD-D+Lcontr KL loss on Latent Distill.