This image features a detailed research poster presented at what appears to be an academic or professional conference. The title of the work is "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning," and it is attributed to several researchers and their affiliations, which include Cisco, the University of Trento, and Fondazione Bruno Kessler. Key elements of the poster include: 1. **Problems in VLM Pruning**: - The poster outlines current issues with Vision-Language Model (VLM) pruning, focusing on "Speed" and "Practicality." It explains that prior works emphasize gradual pruning during training and the need for re-pruning for different tasks. 2. **Question of Universal Pruning**: - It raises the question, "Can we find a universal pruned VLM?" and compares task-specific VLM pruning (current) versus their approach of task-agnostic Vision-Language Pruning. 3. **Processes Explained**: - There are several diagrams and flowcharts depicting the processes involved, including step-by-step depictions of Dense VLM to Pruned VLM transformations for specific and universal applications. - The Task-Specific VLM Pruning section illustrates pruning in various tasks such as captioning, VQA, and retrieval. 4. **Experimental Results**: - The right side of the poster presents "Experimental Results at moderate sparsities" with quantitative data shown in tables to support their findings. These results likely feature metrics on image-text retrieval and image captioning and VQA. 5. **Information and Multimodality-aware Compression**: - The poster touches on advanced concepts like Information Flow, Parameter Importance, and Multimodality-aware Compression, emphasizing the efficiency and global strategies for different modal types. A hand is seen pointing towards one of the diagrams, possibly explaining the content to an attendee, indicating an interactive and engaging element. Overall, this poster presents a robust overview of current challenges and novel approaches in the field of vision-language model pruning. Text transcribed from the image: VERSITAS ATHE CISCO WONDO DI TRENTO UNIVERSITA FONDAZIONE BRUNO KESSLER 1 Some problems in VLM Pruning MULTIFLOW: Shifting Towards Task-Agnostic Vision-Langua Speed: Prior works focus on gradually pruning during training. Practicality: one must re-prune whenever the downstream task changes. Can we find a universal pruned VLM? Task-Specific VLM Pruning (current) t1 CAPTIONING Pruned VLM Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni lacca, (1) University of Trento, (2)CISCO Research, Fondazione Bruno Kessler D Dense VLM Multimodal Flow Pruning Information Flow Multimodamy-aware com 8888 -18 € top-18) Global Saliency Dense VLM ......► YAMANA "a brown tower with a clock on top." "A plate of food and a glass of liquid." "a cat lying down on a bicycle seat." t2 RETRIEVAL Pruned VLM tT VQA "How many street lights do you see?" "One." Pruned VLM 3 Pruned V Information Flow Parameter importance includes the parameter itself and the neurons it connec Multimodality-aware Compression Equal global sparsities allocated for different modality group Experimental Results at moderate sparsities Image-Text Retrieval Image Captioning&VQA BLIPSE XVLM-P KPM Method Sparsity Image-to-Text() Task-Agnostic Vision-Language Pruning (ours) ti CAPTIONING R01 R05 Text-to-Image ROS RO Image-to-Text/ T-to-Impe ROL Micha Specy 10 201 205 BUEN CO DENSE RANDOM 0% 80.72 95.08 62.99 85.27 78.18 94486444 2 DAN 311 " 0.02 012 002 0.10 2148 JUB 37 ZANDOW no SIOP TM SNIP OMP 68.06 3067 SLAS 78.61 30.19 91.27 53.48 30.25 INN NN CHR 12 DAR 23 75.39 9295 58.71 82.82 26.02 93.35 38.30 11.2 at 182 SUE 63% 20.36 LAMP 90.49 53.83 7532 98.27 33 8317 NO ES 1934 we U 25.00 92.59 58.01 1229 WELTHAM t2 CHITA++ MULTIFLOW 7631 93.27 59.02 83.0 7735 17 B 34 LUDON пе 11.2 AT Dense VLM "a group of buildings under nice blue sky." Pruned VLM 0.06 0.0 0.05 014 1218 30.98 A 2015 VE 1370 14 ...▸ RETRIEVAL RANDOM SNIP OMP 67.08 5783 6478 4330 ne CHE 12 112 かす ALDY 51.33 78.51 63.37 8597 48.29 7547 2027 9091 5335 3022 LAMP 3123 475 69.38 9097 33.15 200 75% 2.10 593 Επι LAMP 6462 87.07 48.72 2625 30.33 912 34.35 CHITA++ 65.73 87.97 49.35 71.3 7587 VQA MULTIFLOW