In this image, a presenter stands in front of a scientific poster at a conference, engaging with another attendee. The poster details a research project titled "ViewDiff: 3D-Consistent Image Generation." The presenter, wearing a light-colored shirt, has a lanyard with an orange tag around his neck and is holding a water bottle. His name badge identifies him as "Lukas" from "TUM" and "Voxel51." The background of the poster displays technical information and various images, including generated 3D-consistent images from text. The conference, CVPR 2024, is hosted in Seattle, WA, as indicated by the banner on the poster. The interaction captures a moment of knowledge-sharing and discussion at a bustling academic event. Text transcribed from the image: CVPR JUNE 17-21, 2024 TUTI CO Introduction Tex ViewDiff: 3D-Consistent Image Generation w Lukas Höllein12, Aljaž Božič², Norman Müller, David Christian Richardt?, Michael Zollhöfer, Matthia "work done during Lukas' internship "Technical University of Munich, Meta Generated 3D-Consistent Images From Text precates high-quality, multi-view consistent images of a real-world in authentic surroundings a teddy bear sitting on the ground in the dark a red and white fire hydrant on a brick floor Finetune With Multi-View Supervision e sum pretained text-to-image models into 30 consistent image generators theuning them with multi-view supervision (CO3D dataset) augment the U-Nel architecture with new layers in every U-Net block a glazed donut sitting on a marble counter a red apple on a blue and white patterned Optimized 3D Object (N EL51 VOXEL51 ale Image Reconstruction Real Image Ours SEATTLE, WA Given a single p de and image output poses as we create multi consistent images c ame object in a = oising forward p Apple PSNR SSIM↑ 19.54 0.64 5.79 0.91 5.94 0.91 First, we replace self attention with cross fame attention (yellow) conditioned on pose (RT) is (K) and intensity NeRF optimization from 100 genera Second, we add a projection layer (green) 4**93 into the inner blocks of the U-Net. It creates a 30 presentation from multi-view features and renders them into 30-consistent features ✰ t до CVPR op SEATTLE WA Lukas Lukas Hoellein Technical University of B Munich CVPR282 create a ages с D obje ng the pass ge in a shion. Attent