The image shows a presentation slide at a conference or lecture discussing the metrics used in existing video generation models. The slide is titled "What metrics are used in existing models?" and outlines several models along with the metrics they employ. It includes: - **Gen-1, Runway**: Uses CLIP (consecutive frames) and CLIP (text, frame). - **EMU-Video, Meta**: Utilizes FVD and Inception Scores. - **Tune-a-video**: Implements CLIP (consecutive frames) and CLIP (text, frame). - **Lumiere, Google**: Employs FVD and Inception Scores. - **Video Diffusion Models, Google**: Uses Inception score and FVD. - **Imagen Video, Google**: Utilizes CLIP. - **Stable Diffusion Video**: Employs Peak Signal-to-Noise Ratio (PSNR), Perceptual Image Patch Similarity, and CLIP. Two attendees are partially visible in the foreground, attentively watching the presentation. The room's walls and overhead lighting suggest a formal conference or educational setting. Text transcribed from the image: What metrics are used in existing models? Gen-1, Runway [Esser et.al, 2023] CLIP (consecutive frames) CLIP (text, frame) EMU-Video, Meta [Giridhar et.al, 2024] FVD Video Diffusion Models, Google [Ho et.al, 2022] Inception score FVD Imagen Video, Google [Ho et.al, 2023] CLIP Inception Scores Tune-a-video [Wu et.al, 2023] CLIP (consecutive frames) CLIP (text, frame) Lumiere, Google [Tal et.al, 2024] . FVD Inception Scores Stable diffusion video [Blattmann et.al, 2024] Peak Signal-to-Noise Ratio (PSNR) Perceptual Image Patch Similarity [Zhang et.al, CVPR'18] CLIP. Copyrigh ELS1