In this image, a presenter is delivering a talk on "NUWA-XL: Recursive Interpolations for Generating Very Long Videos." Displayed on the projection screen, the slide details a model called Mask Temporal Diffusion (MTD), highlighting a basic diffusion model for both global and local diffusion models. The slide includes a complex flowchart that visually represents the process and architecture of the MTD model, with annotations related to diffusion processes, prompts, visual conditions, and timestamps. Notably, the diagram includes several blocks such as DownBlock, UpBlock, and MidBlock, and components like "T-KVAE Enc" and "T-KVAE Dec." There is a focus on the steps needed to mask visual conditions both globally and locally. The audience, consisting of a few attendees, is attentively watching the presentation from their seats. The presenter stands to the right of the screen, explaining the intricate details of the model. The conference room is well-lit, with a plain beige background that keeps the focus on the informative slide. Text transcribed from the image: NUWA-XL Recursive interpolations for generating very long videos Mask Temporal Diffusion (MTD) • A basic diffusion model for global & local diffusion models CEN(0,1) L Prompts V CLIP text Enc Timestep-U(1,7) V Time Enc d. d mask middle frames W% T-KLVAE Enc T-KLVAE Enc MSE Loss A €0(x2) DownBlock Itout UpBlock SAN ˇˋ DownBlock Conv Down ˇˋ UpBlock Masking visual conditions DownBlock Conv Down UpBlock SA Global diffusion: mask all Local diffusion: mask middle frames DownBlock Conv Down UpBlock MidBlock A P →Diffusion Process → Visual Condition → Prompts →Timesteps Yin et al., "NUWA-XL: Diffusion over Diffusion for extremely Long Video Generation," arXiv 2023. Copyright Mike Shou, NUS MUM PANCY 12