The image depicts a lively scene at an academic conference where two researchers are presenting their work on "Text-guided Explorable Image Super-resolution." The presentation, associated with Universität Siegen, is prominently displayed on a poster board marked with the number 170. The poster comprehensively details their motivation, approach, qualitative results, and quantitative results in enhancing image resolution through text prompts. On the left side of the image, a researcher wearing a denim jacket stands confidently in front of the poster, engaging with the audience. She has an orange lanyard around her neck, displaying her conference badge. On the right side, another researcher appears to be in discussion, showing an interest in the various graphs and images presented on the poster. The setting seems to be a high-profile conference given the scale and detailed presentations. The background offers a glimpse of a modern, well-lit conference hall with sleek designs and an engaging atmosphere conducive to intellectual exchange.
Text transcribed from the image:
69
M
UNIVERSITÄT
SIEGEN
Motivation
Image Super-resolution (SR) is ill-posed- many HR
images can map to the same LR image.
Most supervised SR methods provide a single solution.
Prior explorable SR works- trained for specific SR.
oling algorithms produce
ersity.
Existing
soluti
16x SR S
Use of
solu
706
66
DDNM [1] (top row) and DPS [2] bottom row.
on of semantically meaningful
VOXEL51
ns
orable SR:
1], DPS [2],
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota", Paramananad Chandramouli
Zero-shot Text-guided SR
Goal: Explore the space of consistent solutions through text prompts.
Ches
Lemons
SR using T2I+DDNM
y= Ax
Data Consistency: Axs y
Semantic
:~q(x) c)
Consistency
the distribution of images
provided by the text input c.
q(xic) having semantic meaning
Leaves Capsicums
Dandelions
SR using T21+IIGDM
Impose null-space consistency in two stages:
In down-sampled pixel space, XLR, is rectified at each
ALRALR)XLR
step as: LR A₁₁y+ (I
In the SR stage, Xoje is rectified at each step to impose
consistency: A'y+ (I-A'A)xot
SR using T2I+DPS
Use reconstruction guidance in two stages:
In the down-sampled pixel space -
XLR
XR--Pexy-ALRXLR)
In the SR stage
x-1=x-Pe Vx,y-A(x)
Incorporate pseudoinverse guidance in two stages:
In down-sampled pixel space
XLR XLR PVALRY-ARAR(XL)
In the SR stage:
x-x--VA'y-A'A(x)
SR using CLIP Guidance
Xey is rectified at each step following DDNM:
or Aly+(I-A'Axor
Compute an intermediate estimate of the previous step:
-1~P(x-1)
Incorporate CLIP guidance.
x-1=-1-V, E(c,x)
LR
Image (T21)
Quantitative Results
E(c, x)CLIP similarity score given text c and image x
[5].
LR
Dataset
SR
Metric
shot SR using
LR PSNR(dBX1)
NIOE(1)
DPS
50.42
DONM
75.40
[5]+DDNM (4)+DDNM CLIP guided [5]+GDM
51.68
67.02
50.16
51.08
841
5.59
6.17
5.54
6.12
686
Faces
16x
LR PSNR(dBXt)
51.98
80.91
51.86
66.30
51.79
52.02
NIQE()
5.54
9.77
5.43
6.38
698
ence.
LR PSNR(dBX)
47.01
72.94
66.33
47.36
48.75
NIQE()
2.66
10.27
4.62
4.88
537
5.10
CLIP(1)
0.2592
0.2326
0.3102
0.3344
0.2564
0.2811
Nocaps
LR PSNR (dBX1) 48.07
78.42)
53.05
70.01
50.97
49.67
16x
NIQE()
4.81
13.24
4.72
CLIP()
0.2418
0.2162
0.3037
0.3381
0.2517
0.2788
CVPR
SEATTLE, WA JUNE 17-21, 2024
Qualitative Results
Imagen DONM
DPS
DONM
Use of text improves diversity in solutions: 16x SR (top) and 8x SR (bottom).
DPS (41-DDNM [5]-DDNM
Curly haired womas Erly women Woman with plan
141-DDNM 15-DDAM
LR
Twe bicyclists are racing through a blurry backgrund
Asmiling girl with plans ende har vi scalat
The Santa omament hangs from a Chrisma
branch amongst the colorful bright lights
A woman wearing plans and sing has a gem be
hat with on purple balloon flower a
that was for p
fewer in front of a whiteboard that wen
Textual descriptions aid in reconstruction of fine details in challenging scenes: 161 SR
CLIP Guidance
141-DDNM
15-DDNM
0000
T21+ПGDM and T2I+DPS have a severe trade-off between consistency (LR PSNR) and text adherence (CLIP score)
Classifier-free guidance improves text adherence, but reduces consistency
[2] Chung et al. Diffusion Posterior Sampling for General Noisy Inverse Problems, in ICLR 2023.
on Null-Space Model, in ICLR 2023.
[5] Saharia et al. Photorealistic text-to-image diffusion models with deep language understanding, in NeurIPS 2022.
06125, 2022.
Comparison HSR Row: smiling child, smiling sonar, smiling woman wearing glasses Rowling wingman elderly m
Row by smiling man with curly hair, elderly man
[3] Song et al. Pseudoinverse-guided diffusion models for inverse problems, in ICLR.2021 141 Ramesh et al. Hierarchical
[6] Radford et of Learning transferable visual models from natural language supervision, in ICM 2021
170
Paramanand