The image shows a research poster from the National University of Singapore (NUS), specifically from a team affiliated with the Learning and Vision group. The poster is titled "Relation Rectification in Diffusion Models" and authored by Yinwei Wu, Xingyi Yang, Xinchao Wang, and others.

### Main Sections of the Poster:

1. **Motivation**: 
   - The poster highlights the challenges with diffusion models misinterpreting the direction of relationships in generated images.
   - Example images are given: a tiger placed incorrectly in relation to a desk based on the prompt "A tiger below the desk."
   - The goal is to rectify these errors to ensure images correctly reflect the intended relationships.

2. **Problem Definition**:
   - The research aims to maximize the generation likelihood of correct relationships while minimizing the likelihood of incorrect ones.
   - Includes a mathematical expression for the optimization objective.
   - Example image: correctly positioned tiger below the desk.

3. **Key Findings**:
   - Identifies the importance of the <EOT> (end of text token) for relationship understanding.
   - Provides examples with a cat to illustrate the significance of embedding position.

4. **Our Method**:
   - Introduces a method to separate <EOT> embeddings of object-spatial phrases (OSPs).
   - Uses a model called RRNet, a Graph Neural Network (GNN) that generates correction vectors.
   - Diagrammatically explains the process involving a text encoder and pre-trained diffusion model.

5. **Statistical Results**:
   - Though not fully detailed in the caption, this section likely contains comparative performance metrics of their method against existing baselines.

### Images and Diagrams:
- Several visual examples of their method in action, showing how the correct relationship (such as proper spatial configuration) is maintained in generated images.
- Diagrams to clarify the method's workflow and the role of text embeddings and RRNet.

### Overall Context:
The poster showcases an advanced method in refining the generation of images by diffusion models, focusing on correct spatial and relational integrations within the visual output. The research has significant implications for improving the accuracy and usefulness of AI-generated visual content.
Text transcribed from the image:
目目
NUS
National University
of Singapore
| Motivation
LEARNING AND VISION
Relation Rectification in Dif
Yinwei Wu1, Xingyi Yang1, Xinchao Wang1, National
| Key findings
Even the best diffusion models sometimes make mistakes about the
direction of relationships.
•
<EOT> (end of text token) is essential for relatio
Original
Mask out embeddings
correspond to words
Mask out embedding
correspond to <EOT>
"A cat inside the box [EOT]
"A cat on the table <EOT>"
"A cat on the table <EOT>"
"A cat on the table <EOT>"
REPLA
Images generated by SDXL based on prompt "A tiger below the desk".
So we want to rectify the diffusion model so that the generated
images have the desired direction of the relationship.
Problem definition
We want to maximize the likelihood of generating the images of
correct relationships while minimize the likelihood of generating
incorrect ones.
arg max Pro(cy)) - Po(c(y)))
Relations information is contained in the embedi
<EOT> embeddings of OSPs have a cosine sin
Our method
Keys: To seperate the <EOT> embeddings of OSE
object node
relation node
R
B
A[EOT] node
edges
ARB
RRNet
hAEOT
A
R
B
"A bowl is palced on the book."
Text
Encoder
bowl [EOT]
ARB
Text embeddings
Pre-trained
Diffusion Model
"A tiger below the
desk".
RRNet:
►is a GNN that generates correction vectors to a
embeddings of OSPS.
▸keeps Diffusion and the CLIP encoder frozen;
Statistical results
Method
RR Dataset
Stable Diffusion
Position(Qwen)† Position(LLaVA) Action(Qwen) ↑
0.763
0.616
0.849