A presentation slide titled "Text2Video-Zero" describes using Stable Diffusion to generate videos without any additional fine-tuning. The slide explains starting from noises of a similar pattern; given the first frame's noise, a global scene motion is defined and used to generate similar initial noise for other frames. A detailed mathematical process and diagram illustrate this concept. The text prompt used as an example is "A horse is galloping on the street." The slide references a paper by Khachatryan et al., titled "Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators," and is attributed to Mike Shou from the National University of Singapore (NUS). The presentation seems to be observed by a few attendees, with the back of their heads visible in the foreground. Text transcribed from the image: QCC Text2 Video-Zero Use Stable Diffusion to generate videos without any finetuning • Start from noises of similar pattern: given the first frame's noise, define a global scene motion, used to translate the first frame's noise to generate similar initial noise for other frames ~N(0,1) x=DDIM Backward(x+, At, SD) a=Wx(x) x=DDPM Forward(a, At) for k=2,3,...,m Text prompt: "A horse is galloping on the street" Convolution Linear Projection Linear Projection Linear Projection Cross-Frame Attention Khachatryan et al., "Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators," arXiv 2023. Softmax (()) XT 0 Cross-Attention FFN Transformer Block x2 Salient Object Detector Copyright Mike Shou, NUS 115