This image captures a research poster presented at what appears to be a scientific or academic conference. The poster, titled "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning," delves into the advancements in the pruning of Vision-Language Models (VLM).

**Key Highlights:**
1. **Problem Overview**: The poster starts by identifying the primary issues with current VLM pruning methods, emphasizing challenges with speed and practicality.
2. **Research Question**: It investigates if a universal pruned VLM can be found, moving beyond task-specific approaches.
3. **Components of the Research**:
   - **Task-Specific VLM Pruning**: The traditional method illustrated with examples such as image captioning and retrieval, showing dense VLMs and their pruned counterparts.
   - **Task-Agnostic Pruning**: The researchers propose a new method, highlighting benefits like improved efficiency and universal applicability.
4. **Methodology**:
   - **Multimodal Flow Pruning**: Detailed diagrams show how information flows in dense VLMs versus pruned VLMs, including concepts like "Multimodality-aware Compression."
   - **Experimental Results**: The poster provides quantitative results, demonstrating the performance of the proposed pruning method in moderate sparsity scenarios, across tasks like Image-Text Retrieval and Image Captioning & VQA.
   - **Additional Metrics**: Graphs and tables present further analysis on sparsity across modalities, pruning runtime, and extreme sparsity cases.
   
The affiliations listed include Cisco, Università di Trento, and Fondazione Bruno Kessler, with contributors named Matteo Farina, Massimiliano Mancini, Elia Guggari, Gao Wen Lin, Giovanni Iacca, and Elisa Ricci. The presentation is part of the CVPR 2023 conference, indicated by the logos and the conference badge number 151.
Text transcribed from the image:
CISCO
ERSITAS
ATHESIA
UNIVERSITÀ
DI TRENTO
Ex
FONDAZIONE
BRUNO KESSLER
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina1, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni lacca, Elisa Ricci1.3
(1) University of Trento, (2) CISCO Research, Fondazione Bruno Kessler
Multimodal Flow Pruning
Some problems in VLM Pruning
Speed: Prior works focus on gradually pruning during
training. Practicality: one must re-prune whenever the
downstream task changes.
Can we find a universal
pruned VLM?
Task-Specific VLM Pruning (current)
t₁
CAPTIONING
"a brown tower
with a clock on top."
Pruned
VLM
Dense VLM
"A plate of food and
a glass of liquid."
t2
RETRIEVAL
"a cat lying down
on a bicycle seat."
Pruned
VLM
"How many street
lights do you see?"
"One."
t|T|
VQA
Pruned
VLM
Dense
VLM
Dg
801-2
Information Flow
S(A)-S)--S(r)
Multimodality-aware Compression
O
10€ top (1-e))
O
CVPR
SEATTLE, WANEL
151
Sparsity across modalities
TH
Global Saliency Score
Beyond layer collage modality collapsal
Activations and weight magnitudes vary significantly across layers
and modalities. Global pruning may wipe out a whole modality
Pruning runtime
Pruned VLM
6
Information Flow Parameter importance includes the parameter itself and the neurons it connects.
Multimodality-aware Compression Equal global sparsities allocated for different modality groups.
Experimental Results at moderate sparsities
Rame
Mabod
BLP
XVEM
SNOP
ITERSNIP
277061
363974
1847 IN 1x250
1512
389-94-640
CHITA++
MULTIPLO
45
X
MES
Image-Text Retrieval
BLIP
XVLM
Method
Sparsity
Image-to-Test (9)
Text-to-Image (9)
ImageText
Text -an-Image (5)
Mad
VOA
Caplining
Task-Agnostic Vision-Language Pruning (ours)
ROI
ROS
ROI
ROS
201
ROS
Re
ROS
U
cob
Image Captioning & VQA
6
Extreme sparsity
AULAN
t₁
CAPTIONING
DENSE
RANDOM
SNIP
0%
80.72
62.99
8527
78.18
944
M4
DEVE
05
00
012
0.02
0.10
21.48
40.00
14.3
37.1
BANDON
4
68.06
963
31.35
78.61
2019
91.27
$141
022
INP
xe
3.M
13AN
1544
7539
92.95
1232
5871
26.03
115
38%
1.26
os
202
11.M
114.25
#
1
63%
t2
Pruned
VLM
...RETRIEVAL
LAMP
CHITA++
MULTIFLOW
RANDOM
SNIP
70.36
ALD
NA
75.32 Ma
81.27
38.36
BL17
www
ORTAN
3160
13
JAN
UM
55
75.00
38.01
12.29
3610
91.20
59.31
M
MELTOW
1334
P.N
154
4
7631
9327
KLAR
77.35
BLS
6421
BODOM
ALTH
10
36016
a
0.06
BOR
1218
30.9
EA
3CH
ST
ea
PUP
2747
A
79.31
3742
6706
37.33
CMP
75%
A
85.97
48.28
25.47
30.27
9091
BUST
NO
Dense VLM
OMP
75%
"a group of buildings
tITI
VQA
210
5.93
435
9097
26
SO
CATA
1105
NOAT
LAMP
7625
MELDALON
ELS CINECA FRE
6462
$7.07
under nice blue sky."
CHITA
MULTIPLOW
65.73 $7.97 4.5
77.38
4