The image captures a presentation slide shown on a large screen in a conference or seminar room. The slide is titled "Text2Video-Zero" and discusses the use of Stable Diffusion to generate videos without any finetuning. It mentions an optional feature for background smoothing, which involves regenerating the background and averaging it with the first frame. The slide contains diagrams and text illustrations explaining the process, with references to various components such as DDIM and DPOP Loss. The footnote credits Khachatryan et al., 2023, for the presented information and mentions the copyright belongs to Dr. Mike Shou from NUS. Several attendees are seen sitting and attentively viewing the presentation, with microphones visible on the floor suggesting a discussion or Q&A session may follow.
Text transcribed from the image:
Text2Video-Zero
Use Stable Diffusion to generate videos without any finetuning
°
Optional background smoothing: regenerate the background, average with the first frame
~N(0,1)
z-DDIM Backward(2), At, SD)
)
DDPM Forward(At)
for &-2,3,
Text prompt: "A horse is galloping on the street-
Convolution
Cross-Frame Attention
Transformer Block x2
Salent Object
Detector
W
Background Smoothing
-Wa(z)
--
Step
for k 1,2 m, xT
Khachatryan et al., "Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators," arXiv 2023.
Copyright Mike Shou, NUS
117