The image depicts a large white board with a long panel of information and a group of people looking at it. The board appears to be a visual representation of information, possibly a display of a model or a floor plan. The people in the image are all looking at the board, some of them are holding a backpack, and at least one person is wearing glasses. The scene is likely a study or a meeting room, with the board serving as a central point of focus. Text transcribed from the image: Unseen: Visual Common Sense for Semantic Placement Samrakhya¹, Aniruddha Kembhavi², Dhruv Batra¹, Zsolt Kira', Kuo-Hao Zeng2*, Luca Weihs2* LAION-400M How to learn Semantic Placement? Key Idea: Use synthetically generated real world and simulation data Inpainting real images Synthetic Data Habitat Simulator State before object placement Remove object N using inpainting Use original detections as labels Automatic data generation pipeline Prompt: 4k, HD Sample a pair of objects Inpainted Image Pass Stable Diffusion LAMA Detic Detic & SAM Filter Stable Diffusion SDEdit 1-p Fail Discard 1M images distractor objects 5% noise Images (B) Find Objects of Interest (C) Inpaint Objects of Interest (D) Filter Model architecture CLIP-ResNet50 7x7x2048 1024 CLIP-TextEncoder Cushion 256 1024 Target: Cushion Sensor Pose Re Embodie Given an with placing meaningful l Embodied Sema 2x Augmented Images (E) Enhance Image Quality Observations (0) RGBD Target Category (ex: "cushion) Observ Mask Prediction M Preference LLM+Detect LLaVal GPT4V CLIP-UNet (