A presenter is giving a talk in a conference room, showcasing a slide titled "Lumiere: A space-time diffusion model for video generation." The slide contains a detailed diagram illustrating the components and workflow of the model. The room features a large projection screen and attendees are seated in front, attentively observing the presentation. One person in the foreground, wearing a black cap, is focused on the screen. The presenter is standing at a podium to the right of the screen. The setting appears professional, with organized seating and modern lighting, creating an environment conducive to learning and discussion. Text transcribed from the image: Lumiere A space-time diffusion model for video generation (a) Space-Time UNet (STUNet) Legend: Spatial Resizing Temporal Resizing Skip Connection Conv-based Inflation Attention-based Inflation TxHxWxD (b) Convolution-based Inflation Block Pretrained Spatial Layer(s) 2D Convolution Norm + activiation ID Convolution Norm+ activiation Linear Projection (e) Attention-based Inflation Block Pretrained Spatial Layer(s) ID Attention (x) Linear Projection Bar-Tal et al., "Lumiere: A space-time diffusion model for video generation," arXiv 2024. 3108 Copyright Mike Shou, NUS