The image shows an academic conference poster presentation on "CosmicMan: A Text-to-Image Foundation Model for Humans," displayed at a booth labeled "Highlight" with the booth number 209. The poster outlines the project motivation, methodology, and experimental results. Key sections include "Motivation," "Human-AI Data Flywheel," "Decomposed Attention Training," and "Experimental Results." The left side of the poster details the data flywheel process and decomposed attention training methods. The right side features a table of experimental results and visual samples generated by the CosmicMan model, showcasing realistic human images. The booth setup includes surrounding tables, chairs, personal items such as a water bottle, and a red tote bag. The surrounding environment suggests a busy conference hall with a modern design, overhead lighting, and other participants engaging in presentations. Additionally, there are QR codes on the top right of the poster for a project page and GitHub repository, indicating avenues for further information and involvement.
Text transcribed from the image:
Highlight
CVPR
229
208
上海人工智能实验室
Suphal Anical adipoce Labray
Motivation:
Highlight
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu", Kaiyuan Liu, Wentao Wang", Kwan-Yee Lin, Wayne Wu
Current foundation models struggle with inferior quality and
fine-grained text-image misalignment for humans.
Human-Al Data Flywheel:
Flowing data and human-in-the-loop annotation.
Produce CosmicMan-HQ with 6M images and 115M labels.
Decomposed-Attention Training:
Data discretion for decomposing text-human image data.
Decompose and refocus cross-attention features in model.
Shanghai Al Laboratory
*Equal Contributions Equal Advising
Data Flywheel -- Annotate Anyone
Paradigm-3: Data Production by Human-AI Cooperation Annotate Anyone
Fetching
www
Internet
Sampling
Labeling
שון
Data Pool
AI Model
Human
Collecting
Datasets
Image Annotation Pairs
(c)
Finetuning
t
Project Page
GitHub
209
Experimental Results
Methods
SD 1.5 (43)
HPSv2
FID
48.09 0.2659
51.61
48.61 0.2647
CLIP
Accobj↑
Accex
Accshape
Accall 1
30.43
87.3
77.4
59.3
74.6
26.27
0.2588
82.8
74.7
58.7
72.0
30.78
88.5
82.5
63.2
78.1
44.62
49.60
0.2603 29.33
87.9
84.4
62.0
78.1
0.2630 29.86
83.3
79.3
55.3
72.6
66.36 0.2673 28.86
86.2
87.1
60.1
77.8
53.89 0.2688
36.78 0.2690
0.2698
35.42
28.89
85.2
79.5
59.4
74.7
28.47 91.7
85.7
66.1
81.2
27.31
92.7
88.3
69.7
83.6
17
SD 2.0 43
SDXL (37)
DeepFloyd-IF [9]
DALLE2 (41)
DALLE3 (1)
MidJourney [31]
CosmicMan-SD
Cosmic Man-SDXL
CHOLA
Global Attributes
Dataset Comparison
S
Se
aging Quality
Gut Face C A Tox Bbes Ko HP A
338
213
11M
Sample from CosmicMan-HQ
oget mad with tre
"A full-body sh
adult woman with
brown hair short
P
Autvid G
U-Net
Lnoise
Shoe Pet
Cody full-body woman (on)
short blouse)
•Son white jacket and
brown way hair (
Says
·
-
·
Text Descriptions
A full-body s
catton shirt, s
Face
Tops
Wear tips Yes
Fashice
Real world
Real world
pe Shirt
Pattern Selid Color
Material Cotton
Sleeve length a
Top length Normal
Long sleeve
Collar shape Color
--Left Leg
Shoes
Wear shoes: Yes
and shoes
Color Black
Material Leather
Length: Ankle
SD 1.5
DALLE2
DALLE3
CosmicMan-SD
Midjourney
IF
SDXL
wwwwww
CosmicMan-SDXL
Text Descriptions: A close up portrait shot, a Caucasian teenager male, fit, a street with a
building in the distance, cotton knitted hat, brown wavy above eyes hair, gray cotton long
sleeve normal solid color hoodie
and long son difende, fit, small road with trees, straight red above-chest hair, normal-length, white
short pisid skirt in pleated shape, catton bec
backpack, socks, black leather oxford sho
shoes,
Text Descriptions
A close up portrait shot, an adult Caucasian female, fit, runway, cotton long sleeve solid color white
normal t-shirt, solid color fur scarf, wavy above chest brown hair,
Text Descriptions: A full-body shot,
pants, Chelsea boots, backpack
a adult Latino male, hoodie, cargo