A presenter at a conference is showing a slide on "NUWA-XL," specifically focusing on "Mask Temporal Diffusion (MTD)." The slide outlines MTD as a basic diffusion model for both global and local diffusion models, detailed with diagrams and text blocks. The audience, partially visible from the back, is attentively watching the presentation. The presenter is standing next to a laptop and a podium on the right side of the screen. The setting appears to be a conference room with a large projection screen displaying detailed technical content. The presenter seems to be explaining a complex concept related to video generation using diffusion models, as indicated by the phrases and intricate diagrams on the slide. Text transcribed from the image: NUWA-XL Recursive interpolations for generating very long videos Mask Temporal Diffusion (MTD) • A basic diffusion model for global & local diffusion models CEN(0,1) L Prompts V CLIP text Enc Timestep-U(1,7) V Time Enc d. d mask middle frames W% T-KLVAE Enc T-KLVAE Enc MSE Loss A €0(x2) DownBlock Itout UpBlock SAN ˇˋ DownBlock Conv Down ˇˋ UpBlock Masking visual conditions DownBlock Conv Down UpBlock SA Global diffusion: mask all Local diffusion: mask middle frames DownBlock Conv Down UpBlock MidBlock A P →Diffusion Process → Visual Condition → Prompts →Timesteps Yin et al., "NUWA-XL: Diffusion over Diffusion for extremely Long Video Generation," arXiv 2023. Copyright Mike Shou, NUS MUM PANCY 12