期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Open and real-world human-AI coordination by heterogeneous training with communication
1
作者 Cong GUAN Ke XUE +5 位作者 Chunpeng FAN Feng CHEN Lichao ZHANG Lei YUAN Chao QIAN Yang YU 《Frontiers of Computer Science》 2025年第4期59-76,共18页
Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners,making it a crucial aspect of cooperative multi-agent reinforcement learning(MARL).Achieving satisfying performan... Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners,making it a crucial aspect of cooperative multi-agent reinforcement learning(MARL).Achieving satisfying performance of AI agents poses a long-standing challenge.Recently,ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings,requiring agents to coordinate efficiently with a range of unseen human partners.However,these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner,which deviates from real-world conditions.To facilitate the practical deployment and application of human-AI coordination in open and real-world environments,we propose the first benchmark for open and real-world human-AI coordination(ORC)called ORCBench.ORCBench includes widely used human-AI coordination environments.Notably,within the context of real-world scenarios,ORCBench considers heterogeneity between AI agents and partners,encompassing variations in capabilities and observations,which aligns more closely with real-world applications.Furthermore,we introduce a framework known as Heterogeneous training with Communication(HeteC)for ORC.HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners.Additionally,HeteC incorporates a communication module that enables human partners to communicate with AI agents,mitigating the adverse effects of partially observable environments.Through a series of experiments,we demonstrate the effectiveness of HeteC in improving coordination performance.Our contribution serves as an initial but important step towards addressing the challenges of ORC. 展开更多
关键词 human-AI coordination multi-agent reinforcement learning COMMUNICATION open-environment coordination real-world coordination
原文传递
Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation
2
作者 Lei YUAN Feng CHEN +1 位作者 Zongzhang ZHANG Yang YU 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第6期101-117,共17页
Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning(MARL).Nowadays,existing works mainly focus on improving the communication efficiency of agents,neglecting that real-world commun... Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning(MARL).Nowadays,existing works mainly focus on improving the communication efficiency of agents,neglecting that real-world communication is much more challenging as there may exist noise or potential attackers.Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration.In this paper,we posit that the ego system trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication,dubbed MA3C,to obtain a robust communication-based policy.In specific,we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system,with which every information channel may suffer from distinct message attacks.Furthermore,as naive adversarial training may impede the generalization ability of the ego system,we design an attacker population generation approach based on evolutionary learning.Finally,the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness,meaning that both the ego system and the attackers are adaptable.Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines. 展开更多
关键词 multi-agent communication adversarial training robustness validation reinforcement learning
原文传递
Model gradient: unified model and policy learning in model-based reinforcement learning
3
作者 Chengxing JIA Fuxiang ZHANG +3 位作者 Tian XU Jing-Cheng PANG Zongzhang ZHANG Yang YU 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第4期117-128,共12页
Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment.Previous model learning methods aim at fitting the transi... Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment.Previous model learning methods aim at fitting the transition data,and commonly employ a supervised learning approach to minimize the distance between the predicted state and the real state.The supervised model learning methods,however,diverge from the ultimate goal of model learning,i.e.,optimizing the learned-in-the-model policy.In this work,we investigate how model learning and policy learning can share the same objective of maximizing the expected return in the real environment.We find model learning towards this objective can result in a target of enhancing the similarity between the gradient on generated data and the gradient on the real data.We thus derive the gradient of the model from this target and propose the Model Gradient algorithm(MG)to integrate this novel model learning approach with policy-gradient-based policy optimization.We conduct experiments on multiple locomotion control tasks and find that MG can not only achieve high sample efficiency but also lead to better convergence performance compared to traditional model-based reinforcement learning approaches. 展开更多
关键词 reinforcement learning model-based reinforcement learning Markov decision process
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部