The image showcases an academic poster presentation titled "BiDiff: Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors." The researchers listed are Lihe Ding, Shaocong Dong, Zhangpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixong Gong, Dan Xu, and Tianfan Xue. Key elements of the poster include: - **Motivation of the Bidirectional Diffusion (BiDiff):** This section highlights the advantages of both 2D and 3D image diffusions. It explains that 2D image diffusion offers better texture and diversity, while 3D shape diffusion provides superior consistency and geometry. The combined approach leverages priors from pre-trained foundation 2D and 3D models and utilizes bidirectional guidance to synchronize 2D & 3D denoise directions. - **Bidirectional Diffusion (BiDiff) Framework:** The framework description outlines the process, where intermediate outputs from the 3D diffusion are rendered into 2D images to guide the denoising of the 2D diffusion model. The 2D outputs are re-projected to assist in 3D denoising, and BiDiff sampling outcomes serve as a starting point for optimization methods. - **Examples and Results:** Various sections demonstrate what BiDiff can achieve, including generating diverse 3D objects and allowing user selection for further refinement. Final results show detailed geometry and texture in 3D objects such as complex creatures, buildings, and shoes. Another section breaks down texture and geometry control separately, showcasing the original and decoupled outputs side-by-side. - **Visual Details:** The poster contains numerous visual elements like diagrams, charts, and sample images that explain the concepts and results. There is a small can of soda visible at the bottom left corner, indicating the informal, real-world environment of the presentation. The poster emphasizes the efficacy of combining 2D and 3D priors in generating high-quality, intricate 3D models, showcasing a novel approach in the field of computer graphics and AI. Text transcribed from the image: • UU 商汤 sensetime BiDiff: Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors Lihe Ding*, Shaocong Dong*, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue Motivation of the Bidirectional Diffusion (BiDiff) 2D image diffusion has better texture and is more diverse. 3D shape diffusion has better 3D consistency and geometry. Leverage priors from pre-trained foundation 2D & 3D models. Bidirectional guidance to synchronize 2D&3D denoise direction. bear What can BiDiff do? dancing A bear dancing kick dance A bear dancing ballet Random seed 27 Final results with detailed geometry and texture Noise 2D diffusion model 2D Denoising 11 Birdirectional Guidance Noise 3D diffusion model 3D Denoising step=960 step 920 step=800 step-500 A bear dressed as a lumberjack step-200 step=0 feedforward results (40s) * refined results (20min) texture mesh (i) Efficiently generate diverse 3D objects in 40 seconds. (ii) Users choose the favorite one for further refinement. A car made out of pizza Bidirectional Diffusion (BiDiff) framework (a) we render the 3D diffusion's intermediate outputs into 2D images, which then guide the denoising of the 2D diffusion model. Simultaneously, the intermediate multi-view outputs from the 2D diffusion are re-projected to assist 3D denoising. (b) BiDiff sampling outcomes as a strong starting point for optimization methods. A silver platter piled high with fruits. Allama wearing a suit (a) 3D Noise 2D Noise 580 step t+1 step t step t-1 ... 3D Pipeline 3D Denoising Noisy Input 3D Foundation ° 2D-3D Control Model 3D SDF Volume Feature Volume Rendering (t-1 step) 2D Pipeline Volume Encoding (b) step 0 A llama wearing a suit. An orangutan playing accordion with its hands spread wide. 3D-2D Control A pig wearing a backpack Feature Volume Noisy Input sos loss 2D Denoising Multi-view images (t-1 step) An eagle head. A Van Gogh style cabin A GUNDAM robot A Nike sport shoe Decouple Texture and Geometry Control An ancient Chinese A golden skull. A crystal skull An ancient Gothic tower. A strong A strong muscular A blue and white A blue and white porcelain teapot porcelain burger. A blut and red Superman clothes A bule and red Superman clothes style car A tetral with van A house in van Cogh's stamy sky style painting on it cegh starry sky style •Red Metal Golden A (*) Dog. Astronaut Astronaut Astronaut (*) in weste 2D Diffusion Sampling (40s) 3D Diffusion Sampling (40%) Refinement Share 3D Feature Volume Yo -15 Y 7.5 (20min)