Resolving Ambiguity in Pointing Gestures Using Contextual Reasoning from Large Language Models

下载PDF

导出

摘要 In everyday life,people effectively convey their intentions through pointing gestures without explicitly naming objects.In particular,pointing gestures used in conjunction with linguistic expressions such as“this”and“that”play a crucial role in intuitively indicating objects or locations in space.Although research on the recognition of such nonverbal gestures has been actively pursued within the field of human-computer interaction(HCI),accurately interpreting a user’s intent remains challenging in situations where the pointing gesture is ambiguous.This paper proposes an integrated system that combines a large language model(LLM),capable of understanding complex human language expressions,with pointing gestures designed to designate targets in space,thereby effectively processing multimodal user commands.The system is designed to accurately recognize user intentions even in complex and uncertain environments(e.g.,indoor spaces with multiple objects)by synergistically leveraging spatial information obtained from pointing gestures and contextual reasoning provided by the LLM.To validate the proposed approach,we constructed a dataset comprising complex real-world environments and diverse utterances,and conducted experiments to meticulously analyze the system’s performance and limitations.This study demonstrates the potential for natural expansion of language-based spatial understanding within HCI,and suggests avenues for future research in related fields.

作者 Sumin Yeon Minjae Lee Jiho Bae Suwon Lee

机构地区 Department of Computer Science and Engineering

出处《Computer Modeling in Engineering & Sciences》 2026年第4期1010-1024,共15页 工程与科学中的计算机建模(英文)

基金 supported by the Learning&Academic Research Institution for Master’s.Ph.D.Students,and Postdocs(LAMP)Program of the National Research Foundation of Korea(NRF)grant funded by the Ministry of Education(No.RS-2023-00301974).

关键词 Multimodal interaction pointing gesture large language model contextual reasoning object referencing human-computer interaction

分类号 TP391.41 [自动化与计算机技术] TP18 [自动化与计算机技术—计算机应用技术]

Computer Modeling in Engineering & Sciences

2026年第4期

浏览历史

内容加载中请稍等...

Resolving Ambiguity in Pointing Gestures Using Contextual Reasoning from Large Language Models

相关作者

相关机构

相关主题

浏览历史