In this image, a person is observing a PowerPoint presentation titled "Lumiere: A space-time diffusion model for video generation." The slide explains a complex machine learning model using diagrams, showing components such as blocks, layers, and connections. The diagrams illustrate the architecture of the model, including elements like Spatio-Temporal UNet (STUNet), various types of inflations (convolution-based and attention-based), and other specific processes involved in the model's framework. The environment appears to be a conference or academic seminar, emphasizing the technical nature of the content being presented. Text transcribed from the image: Lumiere A space-time diffusion model for video generation (a) Space-Time UNet (STUNet) W TX H X W X D Legend: 00 Spatial Resizing A Temporal Resizing Skip Connection Conv-based Inflation Attention-based Inflation 글x블x쁠x ++ l et al., "Lumiere: A space-time diffusion model for video generation," arXiv 2024. (b) Convolution-based Inflation Block Pretrained Spatial Layer(s) 2D Convolution Norm+ activiation ID Convolution Norm+acaviation Linear Projection (c) Attention-based Inflation Pretrained Spati IDA Copyright Linca