In the image, a group of people are gathered in a room, seated on the floor in front of a large, dark screen mounted on the wall. The screen displays an image purple background, presumably a presentation or slide show displaying various data and tables. The individuals in the room are focused on the screen, likely listening to a presentation and analyzing the information presented. The scene suggests that the people are engaged in a professional discussion or workshop, possibly related to data processing and analysis based on the content displayed on the screen.
Text transcribed from the image:
Stable Video Diffusion
Scaling latent video diffusion models to large datasets
Data Processing and Annotation
Table 1. Comparison of our dataset before and after fitering with
publicly available research datasets.
LVD LVD-F LVD-10M LVD-10M-F WebVid InternVid
#Clips
577M 152M 9.8M
2.3M
10.7M 234M
Clip Duration (s)
11.58 10.53 12.11
10.99
18.0
11.7
Total Duration (y)
212.09 50.64 3.76.
0.78
5.94
86.80
Mean #Frames
Mean Clips/Video
Motion Annotations?
325 301
335
320
11.09 4.76
✓
1.2
1.1
1.0
32.96
✓
✓
X
x
Blattmann et al., "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets," 2023.
MAXIMUM
OCCUPANCY
430-912
Copyright Mike Shou, NUS
85