This image showcases a screenshot of a research paper titled "GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear" from arXiv.org, categorized under Computer Science > Human-Computer Interaction. The paper, submitted and last revised on January 31, 2024, is authored by Robert Konrad, Nish Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, and Gordon Wetzstein. The abstract describes the use of multimodal large language models (LMMs) integrated with eye-tracking technology in smart eyewear to streamline user interactions by understanding their gaze. The highlighted "View PDF" button indicates the option to access the full document. The image also shows an engaging layout, with the arXiv logo, navigation menu, and visible phone details such as time and network status at the top. Text transcribed from the image: 8:06 arxiv.org Q = Computer Science > Human-Computer Interaction arXiv:2401.17217 (cs) Submitted on 30 Jan 2004 (1) last revised 31 Jan 2024 (this version, v2 GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual Al for Smart Eyewear Robert Konrad, Nissh Padmanaban J. Gabriel Buckmaster. Kevin C. Boyle. Gordon Wetzstein View PDF Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual Al, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual Al GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to. Using extensive user evaluations, we show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives; that it augments human capabilities by significantly improving their accuracy in a dog breed classification task; and that it is consistently ranked as more natural than head- or body-driven selection mechanisms for contextual Al. Moreover, we prototype a variety of application scenarios that suggest GazeGPT could be of significant value to users as part of future Al-driven personal assistants. Comments: Project video: this hips URL δΈͺ + 6