The image depicts a trade show with several people, including a man wearing glasses, standing in front of a table displaying a photo shoot of cars. The man is talking with people next to the poster, and there are several books and a bottle visible in the background. The chair in the foreground is covered in a colorful pattern.
Text transcribed from the image:
CVPR
JUNE 17-21, 2024
TUTI CO
Introduction
Tex
ViewDiff: 3D-Consistent Image Generation w
Lukas Höllein12, Aljaž Božič², Norman Müller, David
Christian Richardt?, Michael Zollhöfer, Matthia
"work done during Lukas' internship
"Technical University of Munich, Meta
Generated 3D-Consistent Images From Text
precates high-quality, multi-view consistent images of a real-world
in authentic surroundings
a teddy bear sitting on the ground in the dark
a red and white fire hydrant on a brick floor
Finetune With Multi-View Supervision
e sum pretained text-to-image models into 30 consistent image generators
theuning them with multi-view supervision (CO3D dataset)
augment the U-Nel architecture with new layers in every U-Net block
a glazed donut sitting on a marble counter
a red apple on a blue and white patterned
Optimized 3D Object (N
EL51
VOXEL51
ale Image Reconstruction
Real Image
Ours
SEATTLE, WA
Given a single p
de
and
image
output poses as
we create multi
consistent images c
ame object in a =
oising forward p
Apple
PSNR SSIM↑
19.54 0.64
5.79 0.91
5.94 0.91
First, we replace self attention with cross
fame attention (yellow) conditioned on pose
(RT)
is (K) and intensity
NeRF optimization from 100 genera
Second, we add a projection layer (green) 4**93
into the inner blocks of the U-Net. It creates a
30 presentation from multi-view features
and renders them into 30-consistent features
✰ t
до
CVPR
op
SEATTLE WA
Lukas
Lukas Hoellein
Technical University of
B
Munich
CVPR282
create a
ages с
D obje
ng the
pass
ge
in a
shion.
Attent