In the image, a man is seen attending a conference presentation from the back of a conference room. A large screen is visible behind him, displaying a series of charts and graphs related to the subject matter being discussed. The man is seated by a window, with a laptop open in front of him, possibly taking notes or working on his own project. To his right, a cup is placed on a table, and a television is visible on the other side of the room, displaying a PowerPoint presentation that appears to be the main focus of the conference. The room appears to be in a professional setting, with a board room table and chairs arranged in a semi-circle, suggesting a collaborative and interactive atmosphere. Text transcribed from the image: Lumiere A space-time diffusion model for video generation (a) Space-Time UNet (STUNet) W TX H X W X D Legend: 00 Spatial Resizing A Temporal Resizing Skip Connection Conv-based Inflation Attention-based Inflation 글x블x쁠x ++ l et al., "Lumiere: A space-time diffusion model for video generation," arXiv 2024. (b) Convolution-based Inflation Block Pretrained Spatial Layer(s) 2D Convolution Norm+ activiation ID Convolution Norm+acaviation Linear Projection (c) Attention-based Inflation Pretrained Spati IDA Copyright Linca