In the image, a man is seated in the back of a conference hall, using his cell phone to take a photograph or record a video of a presentation displayed on a massive screen in front of him. The screen appears to showcase an interactive project, possibly showcasing various scenes or information. The man's posture suggests that he is focused and attentive to the presentation, as he leans forward to view it closer. The use of his cell phone may indicate that he is trying to capture important information or details that he would like to review later. The conference hall setting implies that the event is likely related to a professional or academic gathering, as presentations and lectures are commonly delivered at such venues. Overall, the image conveys a sense of the attendees' engagement and interest in the presentation, as well as the importance of technology in facilitating communication and sharing of ideas. Text transcribed from the image: Text2Video-Zero Use Stable Diffusion to generate videos without any finetuning Optional background smoothing: regenerate the background, average with the first frame ~N(0,1) -DDIM Backward(z), At, SD) W() DDPM Forward(2,A) for &2,3,...,m Text prompt: "A horse is galloping on the street- Cross-Frame Attention Transformer Block x2 Sallent Object Detector Background Smoothing W() -Mo -M(+1- DOIM Step for k 1,2,...,m, xT Khachatryan et al., "Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators," arXiv 2023. Copyright Mike Shou, NUS 117