The image captures a moment from a technical presentation on "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets." The slide displayed on the projector screen outlines the data processing and annotation involved in the project. It includes a table (Table 1) comparing various datasets before and after filtering with publicly available research datasets. The audience members, engaged and seated on the floor, indicate a keen interest in the topic, suggesting a packed session. The relaxed seating arrangement reflects an informal and highly interactive workshop environment, emphasizing the collaborative spirit of the event. Additionally, the slide credits an author and an institution, indicating the academic and research context of the presentation. Text transcribed from the image: Stable Video Diffusion Scaling latent video diffusion models to large datasets Data Processing and Annotation Table 1. Comparison of our dataset before and after fitering with publicly available research datasets. LVD LVD-F LVD-10M LVD-10M-F WebVid InternVid #Clips 577M 152M 9.8M 2.3M 10.7M 234M Clip Duration (s) 11.58 10.53 12.11 10.99 18.0 11.7 Total Duration (y) 212.09 50.64 3.76. 0.78 5.94 86.80 Mean #Frames Mean Clips/Video Motion Annotations? 325 301 335 320 11.09 4.76 ✓ 1.2 1.1 1.0 32.96 ✓ ✓ X x Blattmann et al., "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets," 2023. MAXIMUM OCCUPANCY 430-912 Copyright Mike Shou, NUS 85