The image captures a moment from a technical presentation on "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets." The slide displayed on the projector screen outlines the data processing and annotation involved in the project. It includes a table (Table 1) comparing various datasets before and after filtering with publicly available research datasets. The audience members, engaged and seated on the floor, indicate a keen interest in the topic, suggesting a packed session. The relaxed seating arrangement reflects an informal and highly interactive workshop environment, emphasizing the collaborative spirit of the event. Additionally, the slide credits an author and an institution, indicating the academic and research context of the presentation.
Text transcribed from the image:
Stable Video Diffusion
Scaling latent video diffusion models to large datasets
Data Processing and Annotation
Table 1. Comparison of our dataset before and after fitering with
publicly available research datasets.
LVD LVD-F LVD-10M LVD-10M-F WebVid InternVid
#Clips
577M 152M 9.8M
2.3M
10.7M 234M
Clip Duration (s)
11.58 10.53 12.11
10.99
18.0
11.7
Total Duration (y)
212.09 50.64 3.76.
0.78
5.94
86.80
Mean #Frames
Mean Clips/Video
Motion Annotations?
325 301
335
320
11.09 4.76
✓
1.2
1.1
1.0
32.96
✓
✓
X
x
Blattmann et al., "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets," 2023.
MAXIMUM
OCCUPANCY
430-912
Copyright Mike Shou, NUS
85