A presenter is discussing advancements in multimodal-guided video generation at a conference, using a slide to illustrate their points. The slide showcases various projects, including "MovieFactory," "CoDi," "MM-Diffusion," and "NEXT-GPT," each with annotated diagrams and brief descriptions. The audience attentively follows along, highlighting the engaged and collaborative atmosphere typical of specialized tech conferences. The slide is titled "Multimodal-Guided Video Generation: More Works," underlining the focus on recent developments and ongoing research in this cutting-edge field. Text transcribed from the image: Multimodal-Guided Video Generation: More Works Sep 1: Spatial Fl Step 2: Tempiral Training Composable Cendening Stam 2 www Cecoration (b) Cendoning Alg Coding Bingle Draining L Presined and Final Moddle Added and Table Module MovieFactory (Zhu et al.) "MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images," arXiv 2023. (0) CODI (Tang et al.) ଅ "Any-to-Any Generation via Composable Diffusion," NeuriPS 2023. M 00 Audo Branch and Sp Ereding MM-Diffusion (Ruan et al.) "MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation," CVPR 2023. Xing et al., "A Survey on Video Diffusion Models," arXiv 2023. Video More modales 掴 Above] the Projesi LIM Audio Output Encoding A Alignment Generation NEXT-GPT (Wu et al.) "NEXT-GPT: Any-to-Any Multimodal LLM," arXiv 2023. Copyright Mike Shou, NUS 160 Я ТЕХОЛ Е