The image shows a screenshot from a mobile device displaying a research paper titled "GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear." The paper is hosted on arXiv.org under the category of Computer Science - Human-Computer Interaction. It was submitted on January 30, 2024, and the latest revision is dated January 31, 2024. The authors of the paper are Robert Konrad, Nitesh Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, and Gordon Wetzstein. The summary of the paper outlines the application of multimodal large language models (LMMs) to enhance human capabilities using smart eyewear. By integrating a world-facing camera and contextual AI, the system, named GazeGPT, tracks the user's eye movements to understand their attention and provide relevant assistance. The study highlights the benefits of using gaze-contingent mechanisms for tasks such as dog-breed classification and natural human interaction, potentially offering significant improvements over traditional methods. The interface includes an option to "View PDF," presumably allowing users to access the full text of the research paper. The mobile view features navigation options at the top and the bottom of the screen, such as an icon to go back, and options for navigation, sharing, and bookmarking, reflecting standard mobile web browsing design. The screenshot was taken at 8:06 AM, as indicated by the device's clock. Text transcribed from the image: 8:06 arxiv.org Q = Computer Science > Human-Computer Interaction arXiv:2401.17217 (cs) Submitted on 30 Jan 2004 (1) last revised 31 Jan 2024 (this version, v2 GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual Al for Smart Eyewear Robert Konrad, Nissh Padmanaban J. Gabriel Buckmaster. Kevin C. Boyle. Gordon Wetzstein View PDF Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual Al, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual Al GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to. Using extensive user evaluations, we show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives; that it augments human capabilities by significantly improving their accuracy in a dog breed classification task; and that it is consistently ranked as more natural than head- or body-driven selection mechanisms for contextual Al. Moreover, we prototype a variety of application scenarios that suggest GazeGPT could be of significant value to users as part of future Al-driven personal assistants. Comments: Project video: this hips URL δΈͺ + 6