Two researchers stand in front of their poster presentation at a conference, showcasing their work on "Text-guided Explorable Image Super-resolution." The poster, from Universität Siegen, details their approach and findings, including motivational context, methodology, and qualitative and quantitative results. The left side of the poster indicates the ill-posed nature of image super-resolution (SR) and outlines the challenges of generating multiple high-resolution images from a single low-resolution input. The middle section discusses the "Zero-shot Text-guided SR" methods used to explore super-resolution solutions through text prompts. The right section of the poster showcases qualitative results with images demonstrating the effectiveness of their techniques. One researcher, wearing glasses and a denim jumpsuit, stands confidently to the left of the poster. An orange conference lanyard hangs around her neck, and she holds a green object in her hand. On the right side of the poster, the other researcher, dressed in a light blue shirt, is seen pensively examining the surrounding details. Various photographs, tables, and graphs illustrate the presentation, indicating a comprehensive and data-driven study. The backdrop of the conference venue is visible with a well-lit ceiling and other posters in adjacent areas, suggesting an active and engaged academic environment. Text transcribed from the image: 69 M UNIVERSITÄT SIEGEN Motivation Image Super-resolution (SR) is ill-posed- many HR images can map to the same LR image. Most supervised SR methods provide a single solution. Prior explorable SR works- trained for specific SR. oling algorithms produce ersity. Existing soluti 16x SR S Use of solu 706 66 DDNM [1] (top row) and DPS [2] bottom row. on of semantically meaningful VOXEL51 ns orable SR: 1], DPS [2], Text-guided Explorable Image Super-resolution Kanchana Vaishnavi Gandikota", Paramananad Chandramouli Zero-shot Text-guided SR Goal: Explore the space of consistent solutions through text prompts. Ches Lemons SR using T2I+DDNM y= Ax Data Consistency: Axs y Semantic :~q(x) c) Consistency the distribution of images provided by the text input c. q(xic) having semantic meaning Leaves Capsicums Dandelions SR using T21+IIGDM Impose null-space consistency in two stages: In down-sampled pixel space, XLR, is rectified at each ALRALR)XLR step as: LR A₁₁y+ (I In the SR stage, Xoje is rectified at each step to impose consistency: A'y+ (I-A'A)xot SR using T2I+DPS Use reconstruction guidance in two stages: In the down-sampled pixel space - XLR XR--Pexy-ALRXLR) In the SR stage x-1=x-Pe Vx,y-A(x) Incorporate pseudoinverse guidance in two stages: In down-sampled pixel space XLR XLR PVALRY-ARAR(XL) In the SR stage: x-x--VA'y-A'A(x) SR using CLIP Guidance Xey is rectified at each step following DDNM: or Aly+(I-A'Axor Compute an intermediate estimate of the previous step: -1~P(x-1) Incorporate CLIP guidance. x-1=-1-V, E(c,x) LR Image (T21) Quantitative Results E(c, x)CLIP similarity score given text c and image x [5]. LR Dataset SR Metric shot SR using LR PSNR(dBX1) NIOE(1) DPS 50.42 DONM 75.40 [5]+DDNM (4)+DDNM CLIP guided [5]+GDM 51.68 67.02 50.16 51.08 841 5.59 6.17 5.54 6.12 686 Faces 16x LR PSNR(dBXt) 51.98 80.91 51.86 66.30 51.79 52.02 NIQE() 5.54 9.77 5.43 6.38 698 ence. LR PSNR(dBX) 47.01 72.94 66.33 47.36 48.75 NIQE() 2.66 10.27 4.62 4.88 537 5.10 CLIP(1) 0.2592 0.2326 0.3102 0.3344 0.2564 0.2811 Nocaps LR PSNR (dBX1) 48.07 78.42) 53.05 70.01 50.97 49.67 16x NIQE() 4.81 13.24 4.72 CLIP() 0.2418 0.2162 0.3037 0.3381 0.2517 0.2788 CVPR SEATTLE, WA JUNE 17-21, 2024 Qualitative Results Imagen DONM DPS DONM Use of text improves diversity in solutions: 16x SR (top) and 8x SR (bottom). DPS (41-DDNM [5]-DDNM Curly haired womas Erly women Woman with plan 141-DDNM 15-DDAM LR Twe bicyclists are racing through a blurry backgrund Asmiling girl with plans ende har vi scalat The Santa omament hangs from a Chrisma branch amongst the colorful bright lights A woman wearing plans and sing has a gem be hat with on purple balloon flower a that was for p fewer in front of a whiteboard that wen Textual descriptions aid in reconstruction of fine details in challenging scenes: 161 SR CLIP Guidance 141-DDNM 15-DDNM 0000 T21+ПGDM and T2I+DPS have a severe trade-off between consistency (LR PSNR) and text adherence (CLIP score) Classifier-free guidance improves text adherence, but reduces consistency [2] Chung et al. Diffusion Posterior Sampling for General Noisy Inverse Problems, in ICLR 2023. on Null-Space Model, in ICLR 2023. [5] Saharia et al. Photorealistic text-to-image diffusion models with deep language understanding, in NeurIPS 2022. 06125, 2022. Comparison HSR Row: smiling child, smiling sonar, smiling woman wearing glasses Rowling wingman elderly m Row by smiling man with curly hair, elderly man [3] Song et al. Pseudoinverse-guided diffusion models for inverse problems, in ICLR.2021 141 Ramesh et al. Hierarchical [6] Radford et of Learning transferable visual models from natural language supervision, in ICM 2021 170 Paramanand