The image depicts a scene from a scientific conference where two researchers are presenting their academic poster titled "Text-guided Explorable Image Super-resolution." The poster, affiliated with Universität Siegen, outlines the motivation, methodology, and results of their research focused on enhancing image resolution through text-based exploration. One of the presenters, a young researcher with glasses, stands smiling to the left of the poster, while another attendee seems engaged in conversation on the right. Both are wearing orange lanyards, likely indicating their conference badges. The poster is neatly divided into sections, featuring visual results demonstrating the effectiveness of their techniques. The setting appears to be a well-lit conference hall with several other posters and attendees visible in the background.
Text transcribed from the image:
69
M
UNIVERSITÄT
SIEGEN
Motivation
Image Super-resolution (SR) is ill-posed- many HR
images can map to the same LR image.
Most supervised SR methods provide a single solution.
Prior explorable SR works- trained for specific SR.
oling algorithms produce
ersity.
Existing
soluti
16x SR S
Use of
solu
706
66
DDNM [1] (top row) and DPS [2] bottom row.
on of semantically meaningful
VOXEL51
ns
orable SR:
1], DPS [2],
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota", Paramananad Chandramouli
Zero-shot Text-guided SR
Goal: Explore the space of consistent solutions through text prompts.
Ches
Lemons
SR using T2I+DDNM
y= Ax
Data Consistency: Axs y
Semantic
:~q(x) c)
Consistency
the distribution of images
provided by the text input c.
q(xic) having semantic meaning
Leaves Capsicums
Dandelions
SR using T21+IIGDM
Impose null-space consistency in two stages:
In down-sampled pixel space, XLR, is rectified at each
ALRALR)XLR
step as: LR A₁₁y+ (I
In the SR stage, Xoje is rectified at each step to impose
consistency: A'y+ (I-A'A)xot
SR using T2I+DPS
Use reconstruction guidance in two stages:
In the down-sampled pixel space -
XLR
XR--Pexy-ALRXLR)
In the SR stage
x-1=x-Pe Vx,y-A(x)
Incorporate pseudoinverse guidance in two stages:
In down-sampled pixel space
XLR XLR PVALRY-ARAR(XL)
In the SR stage:
x-x--VA'y-A'A(x)
SR using CLIP Guidance
Xey is rectified at each step following DDNM:
or Aly+(I-A'Axor
Compute an intermediate estimate of the previous step:
-1~P(x-1)
Incorporate CLIP guidance.
x-1=-1-V, E(c,x)
LR
Image (T21)
Quantitative Results
E(c, x)CLIP similarity score given text c and image x
[5].
LR
Dataset
SR
Metric
shot SR using
LR PSNR(dBX1)
NIOE(1)
DPS
50.42
DONM
75.40
[5]+DDNM (4)+DDNM CLIP guided [5]+GDM
51.68
67.02
50.16
51.08
841
5.59
6.17
5.54
6.12
686
Faces
16x
LR PSNR(dBXt)
51.98
80.91
51.86
66.30
51.79
52.02
NIQE()
5.54
9.77
5.43
6.38
698
ence.
LR PSNR(dBX)
47.01
72.94
66.33
47.36
48.75
NIQE()
2.66
10.27
4.62
4.88
537
5.10
CLIP(1)
0.2592
0.2326
0.3102
0.3344
0.2564
0.2811
Nocaps
LR PSNR (dBX1) 48.07
78.42)
53.05
70.01
50.97
49.67
16x
NIQE()
4.81
13.24
4.72
CLIP()
0.2418
0.2162
0.3037
0.3381
0.2517
0.2788
CVPR
SEATTLE, WA JUNE 17-21, 2024
Qualitative Results
Imagen DONM
DPS
DONM
Use of text improves diversity in solutions: 16x SR (top) and 8x SR (bottom).
DPS (41-DDNM [5]-DDNM
Curly haired womas Erly women Woman with plan
141-DDNM 15-DDAM
LR
Twe bicyclists are racing through a blurry backgrund
Asmiling girl with plans ende har vi scalat
The Santa omament hangs from a Chrisma
branch amongst the colorful bright lights
A woman wearing plans and sing has a gem be
hat with on purple balloon flower a
that was for p
fewer in front of a whiteboard that wen
Textual descriptions aid in reconstruction of fine details in challenging scenes: 161 SR
CLIP Guidance
141-DDNM
15-DDNM
0000
T21+ПGDM and T2I+DPS have a severe trade-off between consistency (LR PSNR) and text adherence (CLIP score)
Classifier-free guidance improves text adherence, but reduces consistency
[2] Chung et al. Diffusion Posterior Sampling for General Noisy Inverse Problems, in ICLR 2023.
on Null-Space Model, in ICLR 2023.
[5] Saharia et al. Photorealistic text-to-image diffusion models with deep language understanding, in NeurIPS 2022.
06125, 2022.
Comparison HSR Row: smiling child, smiling sonar, smiling woman wearing glasses Rowling wingman elderly m
Row by smiling man with curly hair, elderly man
[3] Song et al. Pseudoinverse-guided diffusion models for inverse problems, in ICLR.2021 141 Ramesh et al. Hierarchical
[6] Radford et of Learning transferable visual models from natural language supervision, in ICM 2021
170
Paramanand