The image shows a research poster from the National University of Singapore (NUS), specifically from a team affiliated with the Learning and Vision group. The poster is titled "Relation Rectification in Diffusion Models" and authored by Yinwei Wu, Xingyi Yang, Xinchao Wang, and others. ### Main Sections of the Poster: 1. **Motivation**: - The poster highlights the challenges with diffusion models misinterpreting the direction of relationships in generated images. - Example images are given: a tiger placed incorrectly in relation to a desk based on the prompt "A tiger below the desk." - The goal is to rectify these errors to ensure images correctly reflect the intended relationships. 2. **Problem Definition**: - The research aims to maximize the generation likelihood of correct relationships while minimizing the likelihood of incorrect ones. - Includes a mathematical expression for the optimization objective. - Example image: correctly positioned tiger below the desk. 3. **Key Findings**: - Identifies the importance of the (end of text token) for relationship understanding. - Provides examples with a cat to illustrate the significance of embedding position. 4. **Our Method**: - Introduces a method to separate embeddings of object-spatial phrases (OSPs). - Uses a model called RRNet, a Graph Neural Network (GNN) that generates correction vectors. - Diagrammatically explains the process involving a text encoder and pre-trained diffusion model. 5. **Statistical Results**: - Though not fully detailed in the caption, this section likely contains comparative performance metrics of their method against existing baselines. ### Images and Diagrams: - Several visual examples of their method in action, showing how the correct relationship (such as proper spatial configuration) is maintained in generated images. - Diagrams to clarify the method's workflow and the role of text embeddings and RRNet. ### Overall Context: The poster showcases an advanced method in refining the generation of images by diffusion models, focusing on correct spatial and relational integrations within the visual output. The research has significant implications for improving the accuracy and usefulness of AI-generated visual content. Text transcribed from the image: 目目 NUS National University of Singapore | Motivation LEARNING AND VISION Relation Rectification in Dif Yinwei Wu1, Xingyi Yang1, Xinchao Wang1, National | Key findings Even the best diffusion models sometimes make mistakes about the direction of relationships. • (end of text token) is essential for relatio Original Mask out embeddings correspond to words Mask out embedding correspond to "A cat inside the box [EOT] "A cat on the table " "A cat on the table " "A cat on the table " REPLA Images generated by SDXL based on prompt "A tiger below the desk". So we want to rectify the diffusion model so that the generated images have the desired direction of the relationship. Problem definition We want to maximize the likelihood of generating the images of correct relationships while minimize the likelihood of generating incorrect ones. arg max Pro(cy)) - Po(c(y))) Relations information is contained in the embedi embeddings of OSPs have a cosine sin Our method Keys: To seperate the embeddings of OSE object node relation node R B A[EOT] node edges ARB RRNet hAEOT A R B "A bowl is palced on the book." Text Encoder bowl [EOT] ARB Text embeddings Pre-trained Diffusion Model "A tiger below the desk". RRNet: ►is a GNN that generates correction vectors to a embeddings of OSPS. ▸keeps Diffusion and the CLIP encoder frozen; Statistical results Method RR Dataset Stable Diffusion Position(Qwen)† Position(LLaVA) Action(Qwen) ↑ 0.763 0.616 0.849