The image depicts a presentation slide in a conference room, discussing the various metrics used in existing video generation models. The title reads "What metrics are used in existing models?" and various models are listed, each accompanied by the specific metrics they utilize: 1. **Gen-1, Runway** (Esser et al., 2023): - CLIP (consecutive frames) - CLIP (text, frame) 2. **EMU-Video, Meta** (Giridhar et al., 2024): - FVD - Inception Scores 3. **Tune-a-video** (Wu et al., 2023): - CLIP (consecutive frames) - CLIP (text, frame) 4. **Lumiere, Google** (Tal et al., 2024): - FVD - Inception Scores 5. **Video Diffusion Models, Google** (Ho et al., 2022): - Inception score - FVD 6. **Imagen Video, Google** (Ho et al., 2023): - CLIP 7. **Stable diffusion video** (Blattmann et al., 2024): - Peak Signal-to-Noise Ratio (PSNR) - Perceptual Image Patch Similarity (Zhang et al., CVPR'18) - CLIP The slide is projected onto a screen, and a few attendees are visible in the foreground, attentively observing the presentation. Text transcribed from the image: What metrics are used in existing models? Gen-1, Runway [Esser et.al, 2023] CLIP (consecutive frames) CLIP (text, frame) EMU-Video, Meta [Giridhar et.al, 2024] FVD Video Diffusion Models, Google [Ho et.al, 2022] Inception score FVD Imagen Video, Google [Ho et.al, 2023] CLIP Inception Scores Tune-a-video [Wu et.al, 2023] CLIP (consecutive frames) CLIP (text, frame) Lumiere, Google [Tal et.al, 2024] FVD Inception Scores Stable diffusion video [Blattmann et.al, 2024] Peak Signal-to-Noise Ratio (PSNR) Perceptual Image Patch Similarity [Zhang et.al, CVPR'18] CLIP. CopyrightOD UM NC