The image shows a poster presentation titled "BiDiff: Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors," authored by Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, and Tianfan Xue. The poster is presented by the Commonwealth of University Logo, the institution logo, and SenseTime.

Key sections of the poster include:
1. Motivation of the Bidirectional Diffusion (BiDiff), which highlights the benefits of combining 2D Image and 3D Shape diffusions for improved texture diversity, consistency, and geometry.
2. An explanation of the **Bidirectional Diffusion (BiDiff) framework** that details the method of synchronizing denoise directions from 2D diffusion and reprojecting intermediate 3D outputs back to the 2D diffusion processing.

Several panels show:
- The capabilities of BiDiff in efficiently generating diverse 3D objects.
- **Final results** showcasing detailed geometry and texture outputs for various 3D models.
- Methods to control and decouple the texture and geometry of generated models.

The poster layout includes diagrams, examples of generated 3D models, and technical workflows. The left bottom of the image also shows an empty soda can placed on the table in front of the poster.
Text transcribed from the image:
•
UU
商汤
sensetime
BiDiff: Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Lihe Ding*, Shaocong Dong*, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue
Motivation of the Bidirectional Diffusion (BiDiff)
2D image diffusion has better texture and is more diverse.
3D shape diffusion has better 3D consistency and geometry.
Leverage priors from pre-trained foundation 2D & 3D models.
Bidirectional guidance to synchronize 2D&3D denoise direction.
bear
What can BiDiff do?
dancing
A bear dancing kick
dance
A bear dancing ballet
Random seed
27
Final results with detailed geometry and texture
Noise
2D diffusion model
2D
Denoising
11
Birdirectional
Guidance
Noise
3D diffusion model
3D
Denoising
step=960 step 920 step=800 step-500
A bear dressed as a lumberjack
step-200 step=0
feedforward results (40s)
*
refined results (20min)
texture mesh
(i) Efficiently generate diverse 3D objects in 40 seconds.
(ii) Users choose the favorite one for further refinement.
A car made out of pizza
Bidirectional Diffusion (BiDiff) framework
(a) we render the 3D diffusion's intermediate outputs into 2D images, which then
guide the denoising of the 2D diffusion model. Simultaneously, the intermediate
multi-view outputs from the 2D diffusion are re-projected to assist 3D denoising.
(b) BiDiff sampling outcomes as a strong starting point for optimization methods.
A silver platter piled high with fruits.
Allama wearing a suit
(a)
3D Noise
2D Noise
580
step t+1
step t
step t-1
...
3D Pipeline
3D Denoising
Noisy Input
3D Foundation
°
2D-3D Control
Model
3D SDF
Volume
Feature Volume
Rendering
(t-1 step)
2D Pipeline
Volume
Encoding
(b)
step 0
A llama wearing a suit.
An orangutan playing accordion with its hands spread wide.
3D-2D Control
A pig wearing a backpack
Feature Volume
Noisy Input
sos loss
2D Denoising
Multi-view images
(t-1 step)
An eagle head.
A Van Gogh style cabin
A GUNDAM robot
A Nike sport shoe
Decouple Texture and Geometry Control
An ancient Chinese
A golden skull.
A crystal skull
An ancient Gothic
tower.
A strong
A strong muscular
A blue and white
A blue and white
porcelain teapot
porcelain burger.
A blut and red
Superman clothes
A bule and red
Superman clothes
style car
A tetral with van
A house in van
Cogh's stamy sky
style painting on it
cegh starry sky
style
•Red Metal Golden
A (*) Dog.
Astronaut Astronaut Astronaut
(*) in weste
2D Diffusion Sampling
(40s)
3D Diffusion Sampling
(40%)
Refinement
Share 3D Feature Volume
Yo -15 Y 7.5
(20min)