期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Human-AI coordination via policy generation from language-guided diffusion
1
作者 Kunmin LIN Lei YUAN +3 位作者 Ziqian ZHANG Lihe LI Feng CHEN Yang YU 《Science China(Technological Sciences)》 2026年第1期149-161,共13页
Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence.Previous approaches typically generate a variety of partners to cover ... Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence.Previous approaches typically generate a variety of partners to cover human policies,and then either train a single universal agent or maintain multiple best-response(BR)policies for different partners.However,the first direction struggles with the stochastic and multimodal nature of human behaviors,and the second relies on costly few-shot adaptations during policy deployment,which is unbearable in real-world applications such as healthcare and autonomous driving.Recognizing that human partners can easily articulate their preferences or behavioral styles through natural languages(NLs)and make conventions beforehand,we propose a framework for Human-AI Coordination via Policy Generation from Language-guided Diffusion(Haland).Haland first trains BR policies for various partners using reinforcement learning,and then compresses policy parameters into a single latent diffusion model,conditioned on task-relevant language derived from their behaviors.Finally,the alignment between task-relevant and NLs is achieved to facilitate efficient human-AI coordination.Empirical evaluations across diverse cooperative environments demonstrate that Haland generates agents with significantly enhanced zero-shot coordination performance,utilizing only NL instructions from various partners,and outperforms existing methods by approximately 89.64%. 展开更多
关键词 reinforcement learning human-AI coordination DIFFUSION language-guided reinforcement learning
原文传递
Open and real-world human-AI coordination by heterogeneous training with communication
2
作者 Cong GUAN Ke XUE +5 位作者 Chunpeng FAN Feng CHEN Lichao ZHANG Lei YUAN Chao QIAN Yang YU 《Frontiers of Computer Science》 2025年第4期59-76,共18页
Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners,making it a crucial aspect of cooperative multi-agent reinforcement learning(MARL).Achieving satisfying performan... Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners,making it a crucial aspect of cooperative multi-agent reinforcement learning(MARL).Achieving satisfying performance of AI agents poses a long-standing challenge.Recently,ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings,requiring agents to coordinate efficiently with a range of unseen human partners.However,these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner,which deviates from real-world conditions.To facilitate the practical deployment and application of human-AI coordination in open and real-world environments,we propose the first benchmark for open and real-world human-AI coordination(ORC)called ORCBench.ORCBench includes widely used human-AI coordination environments.Notably,within the context of real-world scenarios,ORCBench considers heterogeneity between AI agents and partners,encompassing variations in capabilities and observations,which aligns more closely with real-world applications.Furthermore,we introduce a framework known as Heterogeneous training with Communication(HeteC)for ORC.HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners.Additionally,HeteC incorporates a communication module that enables human partners to communicate with AI agents,mitigating the adverse effects of partially observable environments.Through a series of experiments,we demonstrate the effectiveness of HeteC in improving coordination performance.Our contribution serves as an initial but important step towards addressing the challenges of ORC. 展开更多
关键词 human-AI coordination multi-agent reinforcement learning COMMUNICATION open-environment coordination real-world coordination
原文传递
Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation 被引量:1
3
作者 Lei YUAN Feng CHEN +1 位作者 Zongzhang ZHANG Yang YU 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第6期101-117,共17页
Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning(MARL).Nowadays,existing works mainly focus on improving the communication efficiency of agents,neglecting that real-world commun... Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning(MARL).Nowadays,existing works mainly focus on improving the communication efficiency of agents,neglecting that real-world communication is much more challenging as there may exist noise or potential attackers.Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration.In this paper,we posit that the ego system trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication,dubbed MA3C,to obtain a robust communication-based policy.In specific,we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system,with which every information channel may suffer from distinct message attacks.Furthermore,as naive adversarial training may impede the generalization ability of the ego system,we design an attacker population generation approach based on evolutionary learning.Finally,the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness,meaning that both the ego system and the attackers are adaptable.Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines. 展开更多
关键词 multi-agent communication adversarial training robustness validation reinforcement learning
原文传递
Model gradient: unified model and policy learning in model-based reinforcement learning
4
作者 Chengxing JIA Fuxiang ZHANG +3 位作者 Tian XU Jing-Cheng PANG Zongzhang ZHANG Yang YU 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第4期117-128,共12页
Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment.Previous model learning methods aim at fitting the transi... Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment.Previous model learning methods aim at fitting the transition data,and commonly employ a supervised learning approach to minimize the distance between the predicted state and the real state.The supervised model learning methods,however,diverge from the ultimate goal of model learning,i.e.,optimizing the learned-in-the-model policy.In this work,we investigate how model learning and policy learning can share the same objective of maximizing the expected return in the real environment.We find model learning towards this objective can result in a target of enhancing the similarity between the gradient on generated data and the gradient on the real data.We thus derive the gradient of the model from this target and propose the Model Gradient algorithm(MG)to integrate this novel model learning approach with policy-gradient-based policy optimization.We conduct experiments on multiple locomotion control tasks and find that MG can not only achieve high sample efficiency but also lead to better convergence performance compared to traditional model-based reinforcement learning approaches. 展开更多
关键词 reinforcement learning model-based reinforcement learning Markov decision process
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部