This image showcases a screenshot of a research paper titled "GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear" from arXiv.org, categorized under Computer Science > Human-Computer Interaction. The paper, submitted and last revised on January 31, 2024, is authored by Robert Konrad, Nish Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, and Gordon Wetzstein. The abstract describes the use of multimodal large language models (LMMs) integrated with eye-tracking technology in smart eyewear to streamline user interactions by understanding their gaze. The highlighted "View PDF" button indicates the option to access the full document. The image also shows an engaging layout, with the arXiv logo, navigation menu, and visible phone details such as time and network status at the top.
Text transcribed from the image:
8:06
arxiv.org
Q =
Computer Science > Human-Computer Interaction
arXiv:2401.17217 (cs)
Submitted on 30 Jan 2004 (1) last revised 31 Jan 2024 (this version, v2
GazeGPT: Augmenting Human
Capabilities using Gaze-contingent
Contextual Al for Smart Eyewear
Robert Konrad, Nissh Padmanaban J. Gabriel Buckmaster. Kevin C. Boyle.
Gordon Wetzstein
View PDF
Multimodal large language models (LMMs) excel in world
knowledge and problem-solving abilities. Through the use of a
world-facing camera and contextual Al, emerging smart
accessories aim to provide a seamless interface between humans
and LMMs. Yet, these wearable computing systems lack an
understanding of the user's attention. We introduce GazeGPT as a
new user interaction paradigm for contextual Al GazeGPT uses
eye tracking to help the LMM understand which object in the
world-facing camera view a user is paying attention to. Using
extensive user evaluations, we show that this gaze-contingent
mechanism is a faster and more accurate pointing mechanism
than alternatives; that it augments human capabilities by
significantly improving their accuracy in a dog breed classification
task; and that it is consistently ranked as more natural than head-
or body-driven selection mechanisms for contextual Al. Moreover,
we prototype a variety of application scenarios that suggest
GazeGPT could be of significant value to users as part of future
Al-driven personal assistants.
Comments: Project video: this hips URL
个
+
6