The image showcases a technical poster presentation displayed at a conference or academic workshop, specifically at the CVPR (Computer Vision and Pattern Recognition) 2023 event held from June 18-22, 2023. The title of the poster is "GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation", and it is authored by Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriam Yanamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Dhami, Dhruv Batra, and Roozbeh Mottaghi. The poster is divided into several sections: 1. **Task**: - Description of the GOAT (Go Anywhere, Do Anything) task, which involves an agent navigating to a sequence of goal locations specified variously as categories, names, language descriptions, or images. 2. **Key Features**: - Open-Vocabulary Multi-Modal Goals: Agents generalize to unseen objects in unexplored scenes. - Lifelong Navigation: Each episode involves 5-10 goals. - Reproducibility: Comprehensive benchmarking through simulation. 3. **Dataset**: - Involves simulated metadata and large language models (LLMs) for goal generation. - Use of Habitat Sim for simulated environments. 4. **Baselines**: - Explanation of various methods and models used for comparison. - Illustration of navigation architecture. 5. **Results**: - Graphs and charts showing the performance enhancements and comparative analysis. - Skill chaining enhances success rates. - Efficiency of different navigation methods over time. - Sensitivity of methods to noise in goal observations. A person points to the poster, indicating a particular detail, showcasing the engagement and interaction typical in academic settings. The environment reflects an indoor conference hall with other participants and presentations visible in the background, emphasizing the collaborative and scholarly atmosphere. Text transcribed from the image: Gr Carnegie Mellon University IMW Task GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation Mukul Khanna, Ram Ramrakhya", Gunjan Chhablani1, Sriram Yenamandra', Theophile Gervet2, Matthew Chang³, Zsolt Kira', Devendra Singh Chaplot4, Dhruv Batra', Roozbeh Mottaghis Dataset Results Brick replece with t creen tv above. The freatece is located below the speaker Double oven located near the kitchen counter kchen cabinet, sink, and blinds Hig o Anything (GOAT) Embodied agent is tasked with navigating to a sequence of open-vocabulary goals specified via - category name, language description or an image. Denim jacket loceted end the purse Leverage simulator metadata and large language models (LLM) to generate language goals Habitat Simulator Grey och Acated on the left side of the room next to the picture and the ookshelf that is located above the box and next to the container it is red and has a lot of books on it SELECTING BEST VIEWPOINT BBOX INFO pow Object goal Language goal image goal white towel located near the sink in the bathroom cabinet and minor region edator in the room where there is a stack of Jackets hanging on it Examples of GOAT-Bench goals Key Features Open-Vocabulary Multi-modal Goals: Specified as Image, language or object category. Tests generalization to seen and unseen objects in unseen scenes. Lifelong Navigation: In each episode agent is tasked with navigating to 5-10 goals in the same scene. Reproducible: Comprehensive benchmarking of existing methods by leveraging simulation. RGB w Croco BERT 12 OBJ . CLIP . . GOAL GOAL Maps sensors to actions to using a separate end-to- end trained CNN+RNN policy for each modality SenseAct-NN Skill-chain "Describe the bed" BLIP V2 "a large bed with a floral comforter +prompt 24 countertop located on a cabinet "Find the bed with a floral comforter and a pillow in the middle." Language goal generation pipeline Baselines CLIP 3-13 RGB 31-3 LANG CLIP CLIP CLIP .. OBJECT DETECTION GOAL 06 - Maps sensors to actions to using a single end-to-end trained CNN+RNN 30 PROJECTION TOPDOWN SEMANTIC MAP ACTION policy SenseAct-NN Monolithic OBJ LANG CLP FEATURE KEYPONT MATONING ACTION INSTANCE MAP +6.6% +4.0% 32.3 BSPL CVPR JUNE 17-21, 2024 10.5 15.9 13.1 102 Madhia GOAT Modular SeresAct-N Skill chaining achieves SOTA on success rate Modular GOAT achieves SOTA on SPL Efficiency of SenseAct-NN and Modular method improves over time -Modular GOAT -SenseAct-NN Monolithic -Modular GOAT -SenseAct-MN Monolithic 20 15 10 Efficiency of navigation improves for both modular and end-to-end trained methods 2 3 5-10 5-10 Number of sub-tasks With memory Winout memory Number of sub-taska Wory Whoory Modular GOAT Sect-NN Monolithic 15. 10 Moduler GOAT SeneAch-NN Manole End-to-end methods do not show drop in performance when long-term memory is disabled Modular methods are more sensitive to noise in goal observations LOCAL POLICY PWPLANNER Object goal Without rose Language goal Image goal with noise Wehout noise with nose 22 50 Winout noise DYNAMIC INSTANCE MAPPING GOAL LOCALIZATION ي Builds explicit map of the environment in combination with path planning for navigation Modular GOAT Modular Skill Chain Monothic GOAT 16.5 11 55 GOAT 25 End-to-end trained methods are robust to noise in goal specification 16