A room filled with people as a person gives a lecture on a projector screen. There are several seats scattered around the room, and a few people can be seen holding laptops or other devices. The person giving the lecture is standing at a podium, and their back is facing the camera. The projector screen displays a diagram of a radio, which is likely the topic of the lecture. The room appears to be a classroom or conference room. Text transcribed from the image: NUWA-XL Recursive interpolations for generating very long videos Mask Temporal Diffusion (MTD) • A basic diffusion model for global & local diffusion models CEN(0,1) L Prompts V CLIP text Enc Timestep-U(1,7) V Time Enc d. d mask middle frames W% T-KLVAE Enc T-KLVAE Enc MSE Loss A €0(x2) DownBlock Itout UpBlock SAN ˇˋ DownBlock Conv Down ˇˋ UpBlock Masking visual conditions DownBlock Conv Down UpBlock SA Global diffusion: mask all Local diffusion: mask middle frames DownBlock Conv Down UpBlock MidBlock A P →Diffusion Process → Visual Condition → Prompts →Timesteps Yin et al., "NUWA-XL: Diffusion over Diffusion for extremely Long Video Generation," arXiv 2023. Copyright Mike Shou, NUS MUM PANCY 12