The image depicts a large poster featuring various figures and charts, likely used for motivational, educational or political purposes. The poster is displayed on a whiteboard, with the words "Making Thinking Visible" and "Strategies for Powering Your Next Great Idea" visible in the top left and bottom right corners, respectively. The poster is large and detailed, with various images and text displayed throughout. The colors used are predominantly blue, with some green and orange accents. The overall design is professional and eye-catching. Text transcribed from the image: Y 達 SOUTH CHINA UNIVERSITY OF TECHNOLOGY Motivation Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples SINGAPORE MANAGEMENT UNIVERSITY Yuyang Yu Bangzhen Liu Chenxi Zheng Xuemiao Xu1.2.3.4+ Huaidong Zhang¹t Shengfeng He 1South China University of Technology 2State Key Laboratory of Subtropical Building Science 3Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information 4Ministry of Education Key Laboratory of Big Data and Intelligent Robot Singapore Management University Comparison Condition Method Ablation Modules Dataset COCO ControlNet T21Adapter PromptDiff HumanSD Sketch FID CSSIM 22.046 19.445 InstructPix2Pix Segmentation FID eSSIMT FID 0.690 0.683 21.377 0.820 Depth CSSIM 21.967 0830 Canny edge HumanArt Pose FID CSSIM FID AP CSSIM 13.539 0.505 40.768 36.43 0852 24.254 0.811 15.856 0.792 16.167 0441 40.219 44.62 0857 35.837 0.815 20.202 0823 28.084 0.504 36.817 4751 ControlNet-100 T21Adapter-100 PromptDiff-100 HumanSD-100 Ours 27.598 0.709 21.053 0.65 31.109 0.828 22.103 0.821 27.194 0.816 28.396 0.764 34.596 0.432 36.659 19.64 0854 ✓ 20.379 23.896 0.769 19.096 0.429 0.721 20.148 0.472 42.601 13.80 0.851 Sketch Pose CVPR SEATTLE WA JUNE 17-21, 2024 Depth (a) sketch ممد (b) FreezeCA (c) Wrong Pr.-100 (d) ControlNet-100 (e) Wrong Pr. (f) ControlNet Visualization results of two key experiments. "FreezeCA" means training ControlNet-100 with frozen cross-attention layers. "Wrong Pr" means inputting the correct prompt to Stable Diffusion and inputting an incorrect prompt to ControlNet during inference. The above figure shows that textual priors constrain learning for new conditions with limited training examples. Overview Prompt-free Encoder 21.049 0.692 32.339 15.20 0.851 20.726 0.835 19.137 0803 16.710 0.475 32.968 2.10 0.55 Quantitative comparison with other methods for different conditions. Under the 100-sample setting, our method achieves a distinct advantage in both FID and CSSIM metrics, illustrating our superiority in terms of image quality and conditional control. We also evaluate the full version of comparative methods, our approach attains a significant level of performance that is on par with the full ControlNet, T21-Adapter, and PromptDiff. Aliving room with a conch tod chair Segmentation The red, double decker ben is driving past other benes Sketch As op dit doos ou a kden fee "A zebra grazing on Fet Fet Prompt-free Encoder. lush green grass in a Beld text Photogaph ise at beer lake Desch P Z₁ Stable Diffusion Depth: ய Conv CRB ST norm Prompt-free Encoder The Big Bes clock tower towering over be city of Lond PCL CNR FID CSSIMT FID↓ APT CSSIM FID CSSIM 21.245 0.684 36.334 23.10 0.853 19.299 0.803 21.049 0.692 32.968 23.10 0.855 19.137 0.803 Statistical results of ablation study for our proposed Prompt-free Conditional Learning (PCL) and Condition-specific Negative Rectification (CNR). Based on PCL, the use of the CNR could enhance the performance on both FID and CSSIM, indicating better image quality and conditional controllability. Applications ControlNet.-100 Inpu Zt Etext Po A Stable Diffusion 0000 EA ST Add Convolution Frozen Trainable Residual Feed (a) Prompt-free Conditional Learning များ၊ Canny edge Pink Skunk Anemone Fish Cay edge The La Sal by This C Stable 1-w Bimg Z Diffusion Pose Garage kits, a figurine with purple hair and blor eyes Waping of a mis sa ga Etext Rectify "ugly, tiling beginner, distorted face Pneg (b) Condition-specific Negative Rectification Overview of our 2-stage optimization framework: Prompt-free Conditional Learning (PCL) and Condition-specific Negative Rectification (CNR). Initially, for PCL, we design a prompt-free encoder to encode the condition and finetune the encoder by incorporating null text conditions with encoded conditional features within the frozen SD model. Subsequently, for CNR, we rectify the negative prompt with the condition features during the CFG process to achieve more precise diffusion guidance. 00 100 Qualitative comparison with other methods. Under the 100-sample setting, our method demonstrates robust visual harmony and adherence to input criteria. ControlNet-100 and T2IAdapter-100 achieve text-image correlation, yet struggle to maintain structural integrity under new conditions. We have also provided extensive comparisons with the full-trained methods, indicating our ability to produce high-quality and authentic images. Thermal to RGB Hed to old photo RGB to Thermal Results for applications. We further evaluate our method in generating new data with limited condition samples through tasks like face-to-thermal, thermal-to- face, and hed-to-old-photo generation. As shown in the above figure, compared with ControlNet, our method effectively leams novel conditions and possesses a better generative quality, exemplifying its efficiency with limited training data 225