This image captures a research poster presented at what appears to be a scientific or academic conference. The poster, titled "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning," delves into the advancements in the pruning of Vision-Language Models (VLM). **Key Highlights:** 1. **Problem Overview**: The poster starts by identifying the primary issues with current VLM pruning methods, emphasizing challenges with speed and practicality. 2. **Research Question**: It investigates if a universal pruned VLM can be found, moving beyond task-specific approaches. 3. **Components of the Research**: - **Task-Specific VLM Pruning**: The traditional method illustrated with examples such as image captioning and retrieval, showing dense VLMs and their pruned counterparts. - **Task-Agnostic Pruning**: The researchers propose a new method, highlighting benefits like improved efficiency and universal applicability. 4. **Methodology**: - **Multimodal Flow Pruning**: Detailed diagrams show how information flows in dense VLMs versus pruned VLMs, including concepts like "Multimodality-aware Compression." - **Experimental Results**: The poster provides quantitative results, demonstrating the performance of the proposed pruning method in moderate sparsity scenarios, across tasks like Image-Text Retrieval and Image Captioning & VQA. - **Additional Metrics**: Graphs and tables present further analysis on sparsity across modalities, pruning runtime, and extreme sparsity cases. The affiliations listed include Cisco, Università di Trento, and Fondazione Bruno Kessler, with contributors named Matteo Farina, Massimiliano Mancini, Elia Guggari, Gao Wen Lin, Giovanni Iacca, and Elisa Ricci. The presentation is part of the CVPR 2023 conference, indicated by the logos and the conference badge number 151. Text transcribed from the image: CISCO ERSITAS ATHESIA UNIVERSITÀ DI TRENTO Ex FONDAZIONE BRUNO KESSLER MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning Matteo Farina1, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni lacca, Elisa Ricci1.3 (1) University of Trento, (2) CISCO Research, Fondazione Bruno Kessler Multimodal Flow Pruning Some problems in VLM Pruning Speed: Prior works focus on gradually pruning during training. Practicality: one must re-prune whenever the downstream task changes. Can we find a universal pruned VLM? Task-Specific VLM Pruning (current) t₁ CAPTIONING "a brown tower with a clock on top." Pruned VLM Dense VLM "A plate of food and a glass of liquid." t2 RETRIEVAL "a cat lying down on a bicycle seat." Pruned VLM "How many street lights do you see?" "One." t|T| VQA Pruned VLM Dense VLM Dg 801-2 Information Flow S(A)-S)--S(r) Multimodality-aware Compression O 10€ top (1-e)) O CVPR SEATTLE, WANEL 151 Sparsity across modalities TH Global Saliency Score Beyond layer collage modality collapsal Activations and weight magnitudes vary significantly across layers and modalities. Global pruning may wipe out a whole modality Pruning runtime Pruned VLM 6 Information Flow Parameter importance includes the parameter itself and the neurons it connects. Multimodality-aware Compression Equal global sparsities allocated for different modality groups. Experimental Results at moderate sparsities Rame Mabod BLP XVEM SNOP ITERSNIP 277061 363974 1847 IN 1x250 1512 389-94-640 CHITA++ MULTIPLO 45 X MES Image-Text Retrieval BLIP XVLM Method Sparsity Image-to-Test (9) Text-to-Image (9) ImageText Text -an-Image (5) Mad VOA Caplining Task-Agnostic Vision-Language Pruning (ours) ROI ROS ROI ROS 201 ROS Re ROS U cob Image Captioning & VQA 6 Extreme sparsity AULAN t₁ CAPTIONING DENSE RANDOM SNIP 0% 80.72 62.99 8527 78.18 944 M4 DEVE 05 00 012 0.02 0.10 21.48 40.00 14.3 37.1 BANDON 4 68.06 963 31.35 78.61 2019 91.27 $141 022 INP xe 3.M 13AN 1544 7539 92.95 1232 5871 26.03 115 38% 1.26 os 202 11.M 114.25 # 1 63% t2 Pruned VLM ...RETRIEVAL LAMP CHITA++ MULTIFLOW RANDOM SNIP 70.36 ALD NA 75.32 Ma 81.27 38.36 BL17 www ORTAN 3160 13 JAN UM 55 75.00 38.01 12.29 3610 91.20 59.31 M MELTOW 1334 P.N 154 4 7631 9327 KLAR 77.35 BLS 6421 BODOM ALTH 10 36016 a 0.06 BOR 1218 30.9 EA 3CH ST ea PUP 2747 A 79.31 3742 6706 37.33 CMP 75% A 85.97 48.28 25.47 30.27 9091 BUST NO Dense VLM OMP 75% "a group of buildings tITI VQA 210 5.93 435 9097 26 SO CATA 1105 NOAT LAMP 7625 MELDALON ELS CINECA FRE 6462 $7.07 under nice blue sky." CHITA MULTIPLOW 65.73 $7.97 4.5 77.38 4