Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunitie...Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.展开更多
In the era of intelligent media,the interaction between teachers and students in higher education is undergoing a profound transformation.The model has shifted from one-way transmission to multi-agent,two-way collabor...In the era of intelligent media,the interaction between teachers and students in higher education is undergoing a profound transformation.The model has shifted from one-way transmission to multi-agent,two-way collaboration involving“teacher-student-AI(artificial intelligence)”.Interaction depth moves from surface Q&A to deep thought engagement,supported by instant,precise feedback and a blended virtual-physical space.New forms such as data-driven personalized interaction and immersive collaborative learning have emerged.However,this evolution brings significant challenges:over-reliance on technology may weaken cognitive autonomy;virtual interaction risks emotional detachment and trust erosion;ethical concerns like algorithmic bias and data privacy arise;teachers’roles become blurred;and evaluation systems lag behind technological advances.Future pathways should position AI as a supportive tool while upholding human centrality.Strengthening emotional connection through online-offline blending,reforming assessment to value process and growth,and empowering teachers as digitally literate“learning guides”and“emotional connectors”are key to building a healthy,sustainable interactive ecosystem.展开更多
Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number o...Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number of cards cause difficulty and inconvenience for users.Secondly,most users conduct experiments only in the visual modal,and such single-modal interaction greatly reduces the users'real sense of interaction.In order to solve these problems,we propose the Multimodal Interaction Algorithm based on Augmented Reality(ARGEV),which is based on visual and tactile feedback in Augmented Reality.In addition,we design a Virtual and Real Fusion Interactive Tool Suite(VRFITS)with gesture recognition and intelligent equipment.Methods The ARGVE method fuses gesture,intelligent equipment,and virtual models.We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR,and to trigger a vibration feedback after a recognizing a five finger grasp gesture.We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.Results The average accuracy rate of gesture recognition was 99.04%.We verify and apply VRFITS in the Augmented Reality Chemistry Lab(ARCL),and the overall operation load of ARCL is thus reduced by 29.42%,in comparison to traditional simulation virtual experiments.Conclusions We achieve real-time fusion of the gesture,virtual model,and intelligent equipment in ARCL.Compared with the NOBOOK virtual simulation experiment,ARCL improves the users'real sense of operation and interaction efficiency.展开更多
Background A large number of robots have put forward the new requirements for human robot interaction.One of the problems in human-swarm robot interaction is how to naturally achieve an efficient and accurate interact...Background A large number of robots have put forward the new requirements for human robot interaction.One of the problems in human-swarm robot interaction is how to naturally achieve an efficient and accurate interaction between humans and swarm robot systems.To address this,this paper proposes a new type of human-swarm natural interaction system.Methods Through the cooperation between three-dimensional(3D)gesture interaction channel and natural language instruction channel,a natural and efficient interaction between a human and swarm robots is achieved.Results First,A 3D lasso technology realizes a batch-picking interaction of swarm robots through oriented bounding boxes.Second,control instruction labels for swarm-oriented robots are defined.The instruction label is integrated with the 3D gesture and natural language through instruction label filling.Finally,the understanding of natural language instructions is realized through a text classifier based on the maximum entropy model.A head-mounted augmented reality display device is used as a visual feedback channel.Conclusions The experiments on selecting robots verify the feasibility and availability of the system.展开更多
Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Severa...Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Several studies have addressed car HMI but were less attentive to designing and implementing interactive glazing for every day(autonomous)driving contexts.Methods Reflecting on the literature,we describe an engineering psychology practice and the design of six novel future user scenarios,which envision the application of a specific set of augmented reality(AR)support user interactions.Additionally,we conduct evaluations on specific scenarios and experiential prototypes,which reveal that these AR scenarios aid the target user groups in experiencing a new type of interaction.The overall evaluation is positive with valuable assessment results and suggestions.Conclusions This study can interest applied psychology educators who aspire to teach how AR can be operationalized in a human-centered design process to students with minimal pre-existing expertise or minimal scientific knowledge in engineering psychology.展开更多
Deep learning-based methods have achieved remarkable success in object detection,but this success requires the availability of a large number of training images.Collecting sufficient training images is difficult in de...Deep learning-based methods have achieved remarkable success in object detection,but this success requires the availability of a large number of training images.Collecting sufficient training images is difficult in detecting damages of airplane engines.Directly augmenting images by rotation,flipping,and random cropping cannot further improve the generalization ability of existing deep models.We propose an interactive augmentation method for airplane engine damage images using a prior-guided GAN to augment training images.Our method can generate many types of damages on arbitrary image regions according to the strokes of users.The proposed model consists of a prior network and a GAN.The Prior network generates a shape prior vector,which is used to encode the information of user strokes.The GAN takes the shape prior vector and random noise vectors to generate candidate damages.Final damages are pasted on the given positions of background images with an improved Poisson fusion.We compare the proposed method with traditional data augmentation methods by training airplane engine damage detectors with state-ofthe-art object detectors,namely,Mask R-CNN,SSD,and YOLO v5.Experimental results show that training with images generated by our proposed data augmentation method achieves a better detection performance than that by traditional data augmentation methods.展开更多
Background Gesture is a basic interaction channel that is frequently used by humans to communicate in daily life. In this paper, we explore to use gesture-based approaches for target acquisition in virtual and augment...Background Gesture is a basic interaction channel that is frequently used by humans to communicate in daily life. In this paper, we explore to use gesture-based approaches for target acquisition in virtual and augmented reality. A typical process of gesture-based target acquisition is: when a user intends to acquire a target, she performs a gesture with her hands, head or other parts of the body, the computer senses and recognizes the gesture and infers the most possible target. Methods We build mental model and behavior model of the user to study two key parts of the interaction process. Mental model describes how user thinks up a gesture for acquiring a target, and can be the intuitive mapping between gestures and targets. Behavior model describes how user moves the body parts to perform the gestures, and the relationship between the gesture that user intends to perform and signals that computer senses. Results In this paper, we present and discuss three pieces of research that focus on the mental model and behavior model of gesture-based target acquisition in VR and AR. Conclusions We show that leveraging these two models, interaction experience and performance can be improved in VR and AR environments.展开更多
Because of the evolution of markets and technologies, prototyping concerns should be kept updated almost day by day. Moreover, user centered design moves the focus towards interaction issues. Prototyping activities ma...Because of the evolution of markets and technologies, prototyping concerns should be kept updated almost day by day. Moreover, user centered design moves the focus towards interaction issues. Prototyping activities matching such characteristics are already available, but they are not so diffused in the industrial domain. This is due to many reasons;an important one is that a rigorous classification of them is missing, as well as an effective helping tool for the selection of the best activities, given the design context. The research described in this paper aims at defining a new classification of prototyping activities, as well as at developing a selection algorithm to choose the best ones in an automatic way. These goals are pursued by defining a set of characteristics that allow describing accurately the prototyping activities. The resulting classification is made by five classes, based on eighteen characteristics. This classification is exploited by the first release of an algorithm for the selection of the best activities, chosen in order to satisfy design situations described thanks to a different set of eleven indices. Five experiences in the field have been used up to now as a starting point for validating the research outcomes.展开更多
Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the instal...Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the installation of expensive hardware in advance.While inside-out tracking controllers have been proposed,they often suffer from limitations such as interaction limited to the tracking range of the sensor(e.g.,a sensor on the head-mounted display(HMD))or the need for pose value modification to function as an input interface(e.g.,a sensor on the controller).This study investigates 6DoF pose estimation methods without restricting the tracking range,using a smartphone as a controller in augmented reality(AR)environments.Our approach involves proposing methods for estimating the initial pose of the controller and correcting the pose using an inside-out tracking approach.In addition,seven pose estimation algorithms were presented as candidates depending on the tracking range of the device sensor,the tracking method(e.g.,marker recognition,visual-inertial odometry(VIO)),and whether modification of the initial pose is necessary.Through two experiments(discrete and continuous data),the performance of the algorithms was evaluated.The results demonstrate enhanced final pose accuracy achieved by correcting the initial pose.Furthermore,the importance of selecting the tracking algorithm based on the tracking range of the devices and the actual input value of the 3D interaction was emphasized.展开更多
This paper investigates the application of Natural Language Processing (NLP) in AI interaction design for virtual experiences. It analyzes the impact of various interaction methods on user experience, integrating Virt...This paper investigates the application of Natural Language Processing (NLP) in AI interaction design for virtual experiences. It analyzes the impact of various interaction methods on user experience, integrating Virtual Reality (VR) and Augmented Reality (AR) technologies to achieve more natural and intuitive interaction models through NLP techniques. Through experiments and data analysis across multiple technical models, this study proposes an innovative design solution based on natural language interaction and summarizes its advantages and limitations in immersive experiences.展开更多
目的为解决增强现实头显在工业管廊巡检中存在的交互认知断层与长时间作业生理疲劳问题,本研究旨在构建并验证一套兼顾直觉认知与工效舒适度的自然手势交互方案。方法融合基于现实的交互与工效学理论,构建了“直觉-工效”双重约束的交...目的为解决增强现实头显在工业管廊巡检中存在的交互认知断层与长时间作业生理疲劳问题,本研究旨在构建并验证一套兼顾直觉认知与工效舒适度的自然手势交互方案。方法融合基于现实的交互与工效学理论,构建了“直觉-工效”双重约束的交互设计模型。通过任务分析明确管廊巡检的核心需求,采“绿野仙踪”(Wizard of Oz)实验提取用户本能意向。针对大幅度挥臂导致的疲劳效应,引入“微手势”修正策略,确立了包含7个核心动作的手势集。搭建包含实体管道道具的“情境模拟”实验平台,在虚实融合环境下对方案的直观性、舒适度及任务鲁棒性进行实证评估。结果实验数据显示,该方案获得了较高的用户认可,整体可用性平均分达到4.35(SD值为0.65),“使用意愿”评分高达4.49。其中,“指尖微操”策略显著提升了感知舒适度,有效调和了操作直观性与生理负荷之间的矛盾。结论本研究提出的基于情境适应性的微手势交互方案,成功验证了从理论构建、策略修正到情境化评估的完整设计范式,为AR技术在工业巡检领域的落地提供了理论与实证支持。展开更多
基金funded by the U.S.Department of Education under Grant Number ED#P116S210005the National Science Foundation under Grant Numbers 2226936 and 2420405.
文摘Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.
文摘In the era of intelligent media,the interaction between teachers and students in higher education is undergoing a profound transformation.The model has shifted from one-way transmission to multi-agent,two-way collaboration involving“teacher-student-AI(artificial intelligence)”.Interaction depth moves from surface Q&A to deep thought engagement,supported by instant,precise feedback and a blended virtual-physical space.New forms such as data-driven personalized interaction and immersive collaborative learning have emerged.However,this evolution brings significant challenges:over-reliance on technology may weaken cognitive autonomy;virtual interaction risks emotional detachment and trust erosion;ethical concerns like algorithmic bias and data privacy arise;teachers’roles become blurred;and evaluation systems lag behind technological advances.Future pathways should position AI as a supportive tool while upholding human centrality.Strengthening emotional connection through online-offline blending,reforming assessment to value process and growth,and empowering teachers as digitally literate“learning guides”and“emotional connectors”are key to building a healthy,sustainable interactive ecosystem.
基金the National Key R&D Program of China(2018YFB1004901)the Independent Innovation Team Project of Jinan City(2019GXRC013).
文摘Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number of cards cause difficulty and inconvenience for users.Secondly,most users conduct experiments only in the visual modal,and such single-modal interaction greatly reduces the users'real sense of interaction.In order to solve these problems,we propose the Multimodal Interaction Algorithm based on Augmented Reality(ARGEV),which is based on visual and tactile feedback in Augmented Reality.In addition,we design a Virtual and Real Fusion Interactive Tool Suite(VRFITS)with gesture recognition and intelligent equipment.Methods The ARGVE method fuses gesture,intelligent equipment,and virtual models.We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR,and to trigger a vibration feedback after a recognizing a five finger grasp gesture.We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.Results The average accuracy rate of gesture recognition was 99.04%.We verify and apply VRFITS in the Augmented Reality Chemistry Lab(ARCL),and the overall operation load of ARCL is thus reduced by 29.42%,in comparison to traditional simulation virtual experiments.Conclusions We achieve real-time fusion of the gesture,virtual model,and intelligent equipment in ARCL.Compared with the NOBOOK virtual simulation experiment,ARCL improves the users'real sense of operation and interaction efficiency.
基金Key-Area Research and Development Program of Guangdong Province(2019B090915002).
文摘Background A large number of robots have put forward the new requirements for human robot interaction.One of the problems in human-swarm robot interaction is how to naturally achieve an efficient and accurate interaction between humans and swarm robot systems.To address this,this paper proposes a new type of human-swarm natural interaction system.Methods Through the cooperation between three-dimensional(3D)gesture interaction channel and natural language instruction channel,a natural and efficient interaction between a human and swarm robots is achieved.Results First,A 3D lasso technology realizes a batch-picking interaction of swarm robots through oriented bounding boxes.Second,control instruction labels for swarm-oriented robots are defined.The instruction label is integrated with the 3D gesture and natural language through instruction label filling.Finally,the understanding of natural language instructions is realized through a text classifier based on the maximum entropy model.A head-mounted augmented reality display device is used as a visual feedback channel.Conclusions The experiments on selecting robots verify the feasibility and availability of the system.
基金Supported by the‘Automotive Glazing Application in Intelligent Cockpit Human-Machine Interface’project(SKHX2021049)a collaboration between the Saint-Go Bain Research and the Beijing Normal University。
文摘Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Several studies have addressed car HMI but were less attentive to designing and implementing interactive glazing for every day(autonomous)driving contexts.Methods Reflecting on the literature,we describe an engineering psychology practice and the design of six novel future user scenarios,which envision the application of a specific set of augmented reality(AR)support user interactions.Additionally,we conduct evaluations on specific scenarios and experiential prototypes,which reveal that these AR scenarios aid the target user groups in experiencing a new type of interaction.The overall evaluation is positive with valuable assessment results and suggestions.Conclusions This study can interest applied psychology educators who aspire to teach how AR can be operationalized in a human-centered design process to students with minimal pre-existing expertise or minimal scientific knowledge in engineering psychology.
基金Natural Science Foundation of Tianjin,China(No.20JCQNJC00720)。
文摘Deep learning-based methods have achieved remarkable success in object detection,but this success requires the availability of a large number of training images.Collecting sufficient training images is difficult in detecting damages of airplane engines.Directly augmenting images by rotation,flipping,and random cropping cannot further improve the generalization ability of existing deep models.We propose an interactive augmentation method for airplane engine damage images using a prior-guided GAN to augment training images.Our method can generate many types of damages on arbitrary image regions according to the strokes of users.The proposed model consists of a prior network and a GAN.The Prior network generates a shape prior vector,which is used to encode the information of user strokes.The GAN takes the shape prior vector and random noise vectors to generate candidate damages.Final damages are pasted on the given positions of background images with an improved Poisson fusion.We compare the proposed method with traditional data augmentation methods by training airplane engine damage detectors with state-ofthe-art object detectors,namely,Mask R-CNN,SSD,and YOLO v5.Experimental results show that training with images generated by our proposed data augmentation method achieves a better detection performance than that by traditional data augmentation methods.
文摘Background Gesture is a basic interaction channel that is frequently used by humans to communicate in daily life. In this paper, we explore to use gesture-based approaches for target acquisition in virtual and augmented reality. A typical process of gesture-based target acquisition is: when a user intends to acquire a target, she performs a gesture with her hands, head or other parts of the body, the computer senses and recognizes the gesture and infers the most possible target. Methods We build mental model and behavior model of the user to study two key parts of the interaction process. Mental model describes how user thinks up a gesture for acquiring a target, and can be the intuitive mapping between gestures and targets. Behavior model describes how user moves the body parts to perform the gestures, and the relationship between the gesture that user intends to perform and signals that computer senses. Results In this paper, we present and discuss three pieces of research that focus on the mental model and behavior model of gesture-based target acquisition in VR and AR. Conclusions We show that leveraging these two models, interaction experience and performance can be improved in VR and AR environments.
文摘Because of the evolution of markets and technologies, prototyping concerns should be kept updated almost day by day. Moreover, user centered design moves the focus towards interaction issues. Prototyping activities matching such characteristics are already available, but they are not so diffused in the industrial domain. This is due to many reasons;an important one is that a rigorous classification of them is missing, as well as an effective helping tool for the selection of the best activities, given the design context. The research described in this paper aims at defining a new classification of prototyping activities, as well as at developing a selection algorithm to choose the best ones in an automatic way. These goals are pursued by defining a set of characteristics that allow describing accurately the prototyping activities. The resulting classification is made by five classes, based on eighteen characteristics. This classification is exploited by the first release of an algorithm for the selection of the best activities, chosen in order to satisfy design situations described thanks to a different set of eleven indices. Five experiences in the field have been used up to now as a starting point for validating the research outcomes.
文摘Six degrees of freedom(6DoF)input interfaces are essential formanipulating virtual objects through translation or rotation in three-dimensional(3D)space.A traditional outside-in tracking controller requires the installation of expensive hardware in advance.While inside-out tracking controllers have been proposed,they often suffer from limitations such as interaction limited to the tracking range of the sensor(e.g.,a sensor on the head-mounted display(HMD))or the need for pose value modification to function as an input interface(e.g.,a sensor on the controller).This study investigates 6DoF pose estimation methods without restricting the tracking range,using a smartphone as a controller in augmented reality(AR)environments.Our approach involves proposing methods for estimating the initial pose of the controller and correcting the pose using an inside-out tracking approach.In addition,seven pose estimation algorithms were presented as candidates depending on the tracking range of the device sensor,the tracking method(e.g.,marker recognition,visual-inertial odometry(VIO)),and whether modification of the initial pose is necessary.Through two experiments(discrete and continuous data),the performance of the algorithms was evaluated.The results demonstrate enhanced final pose accuracy achieved by correcting the initial pose.Furthermore,the importance of selecting the tracking algorithm based on the tracking range of the devices and the actual input value of the 3D interaction was emphasized.
文摘This paper investigates the application of Natural Language Processing (NLP) in AI interaction design for virtual experiences. It analyzes the impact of various interaction methods on user experience, integrating Virtual Reality (VR) and Augmented Reality (AR) technologies to achieve more natural and intuitive interaction models through NLP techniques. Through experiments and data analysis across multiple technical models, this study proposes an innovative design solution based on natural language interaction and summarizes its advantages and limitations in immersive experiences.
文摘目的为解决增强现实头显在工业管廊巡检中存在的交互认知断层与长时间作业生理疲劳问题,本研究旨在构建并验证一套兼顾直觉认知与工效舒适度的自然手势交互方案。方法融合基于现实的交互与工效学理论,构建了“直觉-工效”双重约束的交互设计模型。通过任务分析明确管廊巡检的核心需求,采“绿野仙踪”(Wizard of Oz)实验提取用户本能意向。针对大幅度挥臂导致的疲劳效应,引入“微手势”修正策略,确立了包含7个核心动作的手势集。搭建包含实体管道道具的“情境模拟”实验平台,在虚实融合环境下对方案的直观性、舒适度及任务鲁棒性进行实证评估。结果实验数据显示,该方案获得了较高的用户认可,整体可用性平均分达到4.35(SD值为0.65),“使用意愿”评分高达4.49。其中,“指尖微操”策略显著提升了感知舒适度,有效调和了操作直观性与生理负荷之间的矛盾。结论本研究提出的基于情境适应性的微手势交互方案,成功验证了从理论构建、策略修正到情境化评估的完整设计范式,为AR技术在工业巡检领域的落地提供了理论与实证支持。