This image shows a research poster presented at the Computer Vision and Pattern Recognition (CVPR) conference. The poster is authored by Matteo Farina, Massimiliano Mancini, Eila Cuognetti, Gaowen Liu, Giovanni Iacobacci, and Elisa Ricci from the University of Trento and FBK. The research is titled "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning." Key elements of the poster include: 1. **Introduction to VLM Pruning Problems**: Highlighting issues of speed, practicality, and the need for frequent retraining in traditional task-specific Vision-Language Model (VLM) pruning. 2. **Universal VLM Pruning Approach**: Proposing a task-agnostic pruning methodology to improve efficiency and adaptability across various tasks without needing to prune for each separately. 3. **Multimodal Flow Pruning**: Introducing strategies like Information Flow and Multimodality-aware Compression to enhance pruning methodologies. 4. **Experimental Results**: Providing detailed experimental data on image-text retrieval, image captioning, and visual question answering (VQA) performance at various sparsity levels. 5. **Collaborative and Funding Acknowledgements**: Featuring logos of supporting institutions and collaborators, including Cisco, Università di Trento, and Fondazione Bruno Kessler. The poster also visually represents the workflow of both traditional and proposed pruning methods and substantial quantitative comparisons to underline the benefits of the new task-agnostic approach. Text transcribed from the image: CISCO ERSITAS ATHESIA UNIVERSITÀ DI TRENTO Ex FONDAZIONE BRUNO KESSLER MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning Matteo Farina1, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni lacca, Elisa Ricci1.3 (1) University of Trento, (2) CISCO Research, Fondazione Bruno Kessler Multimodal Flow Pruning Some problems in VLM Pruning Speed: Prior works focus on gradually pruning during training. Practicality: one must re-prune whenever the downstream task changes. Can we find a universal pruned VLM? Task-Specific VLM Pruning (current) t₁ CAPTIONING "a brown tower with a clock on top." Pruned VLM Dense VLM "A plate of food and a glass of liquid." t2 RETRIEVAL "a cat lying down on a bicycle seat." Pruned VLM "How many street lights do you see?" "One." t|T| VQA Pruned VLM Dense VLM Dg 801-2 Information Flow S(A)-S)--S(r) Multimodality-aware Compression O 10€ top (1-e)) O CVPR SEATTLE, WANEL 151 Sparsity across modalities TH Global Saliency Score Beyond layer collage modality collapsal Activations and weight magnitudes vary significantly across layers and modalities. Global pruning may wipe out a whole modality Pruning runtime Pruned VLM 6 Information Flow Parameter importance includes the parameter itself and the neurons it connects. Multimodality-aware Compression Equal global sparsities allocated for different modality groups. Experimental Results at moderate sparsities Rame Mabod BLP XVEM SNOP ITERSNIP 277061 363974 1847 IN 1x250 1512 389-94-640 CHITA++ MULTIPLO 45 X MES Image-Text Retrieval BLIP XVLM Method Sparsity Image-to-Test (9) Text-to-Image (9) ImageText Text -an-Image (5) Mad VOA Caplining Task-Agnostic Vision-Language Pruning (ours) ROI ROS ROI ROS 201 ROS Re ROS U cob Image Captioning & VQA 6 Extreme sparsity AULAN t₁ CAPTIONING DENSE RANDOM SNIP 0% 80.72 62.99 8527 78.18 944 M4 DEVE 05 00 012 0.02 0.10 21.48 40.00 14.3 37.1 BANDON 4 68.06 963 31.35 78.61 2019 91.27 $141 022 INP xe 3.M 13AN 1544 7539 92.95 1232 5871 26.03 115 38% 1.26 os 202 11.M 114.25 # 1 63% t2 Pruned VLM ...RETRIEVAL LAMP CHITA++ MULTIFLOW RANDOM SNIP 70.36 ALD NA 75.32 Ma 81.27 38.36 BL17 www ORTAN 3160 13 JAN UM 55 75.00 38.01 12.29 3610 91.20 59.31 M MELTOW 1334 P.N 154 4 7631 9327 KLAR 77.35 BLS 6421 BODOM ALTH 10 36016 a 0.06 BOR 1218 30.9 EA 3CH ST ea PUP 2747 A 79.31 3742 6706 37.33 CMP 75% A 85.97 48.28 25.47 30.27 9091 BUST NO Dense VLM OMP 75% "a group of buildings tITI VQA 210 5.93 435 9097 26 SO CATA 1105 NOAT LAMP 7625 MELDALON ELS CINECA FRE 6462 $7.07 under nice blue sky." CHITA MULTIPLOW 65.73 $7.97 4.5 77.38 4