The image depicts a trade show with several people, including a man wearing glasses, standing in front of a table displaying a photo shoot of cars. The man is talking with people next to the poster, and there are several books and a bottle visible in the background. The chair in the foreground is covered in a colorful pattern. Text transcribed from the image: CVPR JUNE 17-21, 2024 TUTI CO Introduction Tex ViewDiff: 3D-Consistent Image Generation w Lukas Höllein12, Aljaž Božič², Norman Müller, David Christian Richardt?, Michael Zollhöfer, Matthia "work done during Lukas' internship "Technical University of Munich, Meta Generated 3D-Consistent Images From Text precates high-quality, multi-view consistent images of a real-world in authentic surroundings a teddy bear sitting on the ground in the dark a red and white fire hydrant on a brick floor Finetune With Multi-View Supervision e sum pretained text-to-image models into 30 consistent image generators theuning them with multi-view supervision (CO3D dataset) augment the U-Nel architecture with new layers in every U-Net block a glazed donut sitting on a marble counter a red apple on a blue and white patterned Optimized 3D Object (N EL51 VOXEL51 ale Image Reconstruction Real Image Ours SEATTLE, WA Given a single p de and image output poses as we create multi consistent images c ame object in a = oising forward p Apple PSNR SSIM↑ 19.54 0.64 5.79 0.91 5.94 0.91 First, we replace self attention with cross fame attention (yellow) conditioned on pose (RT) is (K) and intensity NeRF optimization from 100 genera Second, we add a projection layer (green) 4**93 into the inner blocks of the U-Net. It creates a 30 presentation from multi-view features and renders them into 30-consistent features ✰ t до CVPR op SEATTLE WA Lukas Lukas Hoellein Technical University of B Munich CVPR282 create a ages с D obje ng the pass ge in a shion. Attent