The image depicts a presentation slide in a conference room, discussing the various metrics used in existing video generation models. The title reads "What metrics are used in existing models?" and various models are listed, each accompanied by the specific metrics they utilize:

1. **Gen-1, Runway** (Esser et al., 2023):
   - CLIP (consecutive frames)
   - CLIP (text, frame)

2. **EMU-Video, Meta** (Giridhar et al., 2024):
   - FVD
   - Inception Scores

3. **Tune-a-video** (Wu et al., 2023):
   - CLIP (consecutive frames)
   - CLIP (text, frame)

4. **Lumiere, Google** (Tal et al., 2024):
   - FVD
   - Inception Scores

5. **Video Diffusion Models, Google** (Ho et al., 2022):
   - Inception score
   - FVD

6. **Imagen Video, Google** (Ho et al., 2023):
   - CLIP

7. **Stable diffusion video** (Blattmann et al., 2024):
   - Peak Signal-to-Noise Ratio (PSNR)
   - Perceptual Image Patch Similarity (Zhang et al., CVPR'18)
   - CLIP

The slide is projected onto a screen, and a few attendees are visible in the foreground, attentively observing the presentation.
Text transcribed from the image:
What metrics are used in existing models?
Gen-1, Runway [Esser et.al, 2023]
CLIP (consecutive frames)
CLIP (text, frame)
EMU-Video, Meta [Giridhar et.al,
2024]
FVD
Video Diffusion Models, Google [Ho et.al,
2022]
Inception score
FVD
Imagen Video, Google [Ho et.al, 2023]
CLIP
Inception Scores
Tune-a-video [Wu et.al, 2023]
CLIP (consecutive frames)
CLIP (text, frame)
Lumiere, Google [Tal et.al, 2024]
FVD
Inception Scores
Stable diffusion video [Blattmann et.al, 2024]
Peak Signal-to-Noise Ratio (PSNR)
Perceptual Image Patch Similarity
[Zhang et.al, CVPR'18]
CLIP.
CopyrightOD
UM
NC