期刊文献+
共找到264,134篇文章
< 1 2 250 >
每页显示 20 50 100
Leader-following positive consensus of heterogeneous switched multi-agent systems with average dwell time switching
1
作者 Kaiming Li Wei Xing +1 位作者 Haoyue Yang Junfeng Zhang 《Control Theory and Technology》 2026年第1期66-81,共16页
This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consen... This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consensus under average dwell time switching.Then sufficient conditions are derived to guarantee the positive consensus.The gain matrices of the control protocol are described using a matrix decomposition approach and the corresponding computational complexity is reduced by resorting to linear programming and co-positive Lyapunov functions.Finally,two numerical examples are provided to illustrate the results obtained. 展开更多
关键词 Heterogeneous switched multi-agent systems Positive consensus Linear programming
原文传递
Research on UAV-MEC Cooperative Scheduling Algorithms Based on Multi-Agent Deep Reinforcement Learning
2
作者 Yonghua Huo Ying Liu +1 位作者 Anni Jiang Yang Yang 《Computers, Materials & Continua》 2026年第3期1823-1850,共28页
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier... With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics. 展开更多
关键词 UAV-MEC networks multi-agent deep reinforcement learning MATD3 task offloading
在线阅读 下载PDF
Finite-time fault-tolerant tracking control for multi-agent systems based on neural observer
3
作者 Junzhe Cheng Shitong Zhang +1 位作者 Qing Wang Bin Xin 《Control Theory and Technology》 2026年第1期10-23,共14页
This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external di... This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external disturbances.Under the directed topology conditions,an observer-based finite-time control strategy based on adaptive backstepping and is proposed,in which a neural network-based state observer is employed to approximate the unmeasurable system state variables.To address the complexity explosion problem associated with the backstepping method,a finite-time command filter is incorporated,with error compensation signals designed to mitigate the filter-induced errors.Additionally,the Butterworth low-pass filter is introduced to avoid the algebraic ring problem in the design of the controller.The finite-time stability of the closed-loop system is rigorously analyzed with the finite-time Lyapunov stability criterion,validating that all closed-loop signals of the system remain bounded within a finite time.Finally,the effectiveness of the proposed control strategy is verified through a simulation example. 展开更多
关键词 multi-agent systems Command filtered backstepping Finite-time control Neural observer Non-affine faults
原文传递
Multi-agent reinforcement learning with layered autonomy and collaboration for enhanced collaborative confrontation
4
作者 Xiaoyu XING Haoxiang XIA 《Chinese Journal of Aeronautics》 2026年第2期370-388,共19页
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p... Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target. 展开更多
关键词 Attack-defense confrontation Collaborative confrontation Autonomous agents multi-agent systems Reinforcement learning Maneuvering decisionmaking
原文传递
MultiAgent-CoT:A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding
5
作者 Ans D.Alghamdi 《Computers, Materials & Continua》 2026年第2期1395-1429,共35页
Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal ... Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches. 展开更多
关键词 multi-agent systems chain-of-thought reasoning multimodal dialogue conversational artificial intelligence(AI) cross-modal fusion reasoning Interpretability
在线阅读 下载PDF
Defending Against Jamming and Interference for Internet of UAVs Using Cooperative Multi-Agent Reinforcement Learning with Mutual Information
6
作者 Lin Yan Wu Zhijuan +4 位作者 Peng Nuoheng Zhao Tianyu Zhang Yijin Shu Feng Li Jun 《China Communications》 2025年第5期220-237,共18页
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin... The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs. 展开更多
关键词 anti-jamming communication internet of UAVs multi-agent reinforcement learning spectrum allocation
在线阅读 下载PDF
Multi-Agent Reinforcement Learning for Moving Target Defense Temporal Decision-Making Approach Based on Stackelberg-FlipIt Games
7
作者 Rongbo Sun Jinlong Fei +1 位作者 Yuefei Zhu Zhongyu Guo 《Computers, Materials & Continua》 2025年第8期3765-3786,共22页
Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,... Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness. 展开更多
关键词 Cyber security moving target defense multi-agent reinforcement learning security metrics game theory
在线阅读 下载PDF
“大数据、大模型、大计算”全新范式与舆情精准研判:理论和Multi-Agent实证两个向度的探索 被引量:2
8
作者 丁晓蔚 戚庆燕 刘梓航 《传媒观察》 2025年第2期28-42,共15页
本文探讨了“大数据、大模型、大计算”全新范式在舆情精准研判中的相关理论和应用实证。理论部分论述了该范式的概念和所涉关系,分析了其与Multi-Agent多智能体系统之间的联系。实证部分基于此范式在舆情研判中的应用案例,提出Multi-Ag... 本文探讨了“大数据、大模型、大计算”全新范式在舆情精准研判中的相关理论和应用实证。理论部分论述了该范式的概念和所涉关系,分析了其与Multi-Agent多智能体系统之间的联系。实证部分基于此范式在舆情研判中的应用案例,提出Multi-Agent多智能体协作驱动的舆情分析框架,构建全新的舆情研判流程,能有效应对动态变化的舆情环境。采用Multi-Agent对热点事件是否上热搜进行预测和检验,并与传统大模型和BERT模型进行对比分析。研究表明:Multi-Agent在应对涉及公众情感共鸣和社会性广泛事件时具有显著优势,能通过多角度的综合评估提升预测精度和鲁棒性。通过实证研究验证了Multi-Agent在舆情监测中的重要价值,为未来舆情精准研判提供了新的技术路径。 展开更多
关键词 “大数据、大模型、大计算”全新范式 multi-agent多智能体系统 舆情精准研判
原文传递
华北克拉通南缘少华山-崤山-熊耳山地区2.9~1.7Ga多期次花岗质岩浆作用成因与陆壳演化 被引量:1
9
作者 周艳艳 郑亚莉 +5 位作者 笪永发 张儒诚 祝禧艳 赵磊 赵太平 翟明国 《岩石学报》 北大核心 2026年第1期38-70,共33页
华北克拉通太古宙-古元古代的构造-岩浆-沉积记录丰富、完整,是研究早期陆壳多阶段增生和演化规律的天然实验室。然而,目前关于其早期陆壳增生机制和构造演变过程仍存有争议。华北克拉通南缘太华杂岩发育完整的太古宙-古元古代结晶基底... 华北克拉通太古宙-古元古代的构造-岩浆-沉积记录丰富、完整,是研究早期陆壳多阶段增生和演化规律的天然实验室。然而,目前关于其早期陆壳增生机制和构造演变过程仍存有争议。华北克拉通南缘太华杂岩发育完整的太古宙-古元古代结晶基底,出露丰富的TTG及基性-花岗质岩石组合,是研究早期陆壳生长和演化的理想区域。本文聚焦华北克拉通南缘少华山-崤山-熊耳山地区2.9~1.7Ga的TTG及花岗质岩石,开展系统的岩石学、年代学和地球化学的研究。结果显示,研究区至少发育七期TTG及花岗质岩浆作用,包括~2.9Ga英云闪长岩(TTG)、~2.7Ga花岗闪长岩(TTG)、2.53~2.42Ga英云闪长岩(TTG)和钾长-二长花岗岩、2.33~2.27Ga奥长花岗岩(TTG)、闪长岩和钾长-二长花岗岩、2.22~2.19Ga二长花岗岩及侵入TTG片麻岩中的浅色脉体、1.94~1.81Ga钾长-二长花岗岩-花岗闪长岩,以及1.78~1.76Ga的钾长-二长花岗岩。其中,2.9~2.3Ga TTG以中-低压型为主;~2.5Ga的花岗质岩石显示I-S型花岗岩特征;~2.3Ga、~2.2Ga及~1.7Ga的花岗质岩石类似于A型花岗岩;1.94~1.81Ga同时发育A型花岗岩和I-S型花岗岩。~2.9Ga、~2.7Ga和~2.5Ga的三期TTG和~2.5Ga花岗质岩石记录了早期陆壳多阶段的生长和演化,可能形成于俯冲-碰撞的构造环境。~2.3Ga TTG总体具有低压特征,来自基性下地壳在高地温梯度下的部分熔融,与同期古老富集地幔来源的闪长岩和板内A型花岗岩一起指示板内伸展环境。2.2~2.1Ga A型花岗岩来自古老陆壳物质的重熔,结合已有2.3~2.1Ga双峰式火山岩、A型花岗岩、低δ18 O花岗岩-辉长闪长岩等,指示了伸展-裂解背景下不同深度地壳和地幔的再循环。1.94~1.81Ga I-S型花岗岩和A型花岗岩可能记录了古元古代俯冲-碰撞拼合的历史。1.78~1.76Ga A型花岗岩可能是板内伸展-裂解作用下的陆壳减薄诱发地壳部分熔融的产物。华北克拉通南缘太古宙-古元古代广泛发育的花岗质岩浆作用记录了多阶段的陆壳生长、演化和构造体制转型,为揭示不同阶段地球圈层物质循环规律和动力学过程提供了关键约束。 展开更多
关键词 华北克拉通南缘 2.9~1.7ga TTG-花岗质岩石 锆石U-Pb定年 锆石Lu-Hf同位素 陆壳生长与再循环
在线阅读 下载PDF
‘741杨’赤霉素氧化酶基因PthGA2ox19调节植株生长发育 被引量:1
10
作者 张晓宁 王志安 +5 位作者 唐叶 许梓腾 孙大智 许云娇 杨江伟 吴家和 《生物工程学报》 北大核心 2026年第1期303-318,共16页
赤霉素2-氧化酶(gibberellin 2-oxidase,GA2ox)是植物体内调控赤霉素(gibberellic acid,GAs)代谢的关键酶,鉴定杨树GA2ox基因并解析其在调控植株生长发育中的功能,能够为选育杨树新品种提供技术支持。本研究通过生物信息学方法对‘741杨... 赤霉素2-氧化酶(gibberellin 2-oxidase,GA2ox)是植物体内调控赤霉素(gibberellic acid,GAs)代谢的关键酶,鉴定杨树GA2ox基因并解析其在调控植株生长发育中的功能,能够为选育杨树新品种提供技术支持。本研究通过生物信息学方法对‘741杨’GA2ox基因进行鉴定和分析,共鉴定出34个GA2ox基因,分布在‘741杨’的7对染色体上。利用实时荧光定量PCR(quantitative real-time PCR,qRT-PCR)技术分析PthGA2ox19的组织表达模式和GA3诱导的表达模式,发现PthGA2ox19在茎中高表达并且在GA3诱导下表达水平显著提高。构建PthGA2ox19过表达载体,并使用农杆菌转化法转化杨树,发现转基因株系PthGA2ox19的表达水平相较于野生型植株显著提高,表型显示为株高变矮、茎秆变细、节间缩短、叶片变小等。通过石蜡切片及显微镜观察分析转基因植株维管组织发育情况,维管组织发育分析表明转基因植株的维管束发育畸形、导管口径减小、木质部和韧皮部厚度变薄。本研究鉴定出了34个‘741杨’GA2ox基因,其中PthGA2ox19的过表达抑制了杨树的生长和维管组织的发育,表明杨树GA2ox参与了植株生长发育的调控,本研究结果为杨树株型育种提供了新的途径。 展开更多
关键词 杨树 赤霉素 ga2ox 转基因植株 生长发育
原文传递
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
11
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system multi-agent system Distributed optimization multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Improved Event-Triggered Adaptive Neural Network Control for Multi-agent Systems Under Denial-of-Service Attacks 被引量:2
12
作者 Huiyan ZHANG Yu HUANG +1 位作者 Ning ZHAO Peng SHI 《Artificial Intelligence Science and Engineering》 2025年第2期122-133,共12页
This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method... This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system. 展开更多
关键词 multi-agent systems neural network DoS attacks memory-based adaptive event-triggered mechanism
在线阅读 下载PDF
Embodied Multi-Agent Systems:A Review 被引量:1
13
作者 Zhuo Li Weiran Wu +2 位作者 Yunlong Guo Jian Sun Qing-Long Han 《IEEE/CAA Journal of Automatica Sinica》 2025年第6期1095-1116,共22页
Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their inte... Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks.This goal resonates with a paradigm shift within the artificial intelligence(AI)community,from“internet AI”to“embodied AI”,and the MASs with embodied AI are referred to as embodied multi-agent systems(EMASs).An EMAS has the potential to acquire generalized competencies through interactions with environments,enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence.Despite the burgeoning interest in this domain,a comprehensive review of EMAS has been lacking.This paper offers analysis and synthesis for EMASs from a control perspective,conceptualizing each embodied agent as an entity equipped with a“brain”for decision and a“body”for environmental interaction.System designs are classified into open-loop,closed-loop,and double-loop categories,and EMAS implementations are discussed.Additionally,the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided. 展开更多
关键词 Embodied intelligence multi-agent system feedback control interaction
在线阅读 下载PDF
Ga添加对(Nd,Ce)-Fe-B磁体的晶界相及磁性能的影响
14
作者 尧唤茂 王磊 +1 位作者 胡欣洋 赵康 《有色金属科学与工程》 北大核心 2026年第1期147-155,共9页
本文研究了Ga元素对(Nd,Ce)-Fe-B烧结钕铁硼磁体晶界相的微观结构和磁性能的影响。当添加0.1%的Ga元素时,磁体内禀矫顽力(Hcj)从891.52 kA/m增加到1043.56 kA/m。通过分析发现:添加微量的Ga会提高晶界中REFe_(2)(RE=Pr,Nd,Ce)相的比例,... 本文研究了Ga元素对(Nd,Ce)-Fe-B烧结钕铁硼磁体晶界相的微观结构和磁性能的影响。当添加0.1%的Ga元素时,磁体内禀矫顽力(Hcj)从891.52 kA/m增加到1043.56 kA/m。通过分析发现:添加微量的Ga会提高晶界中REFe_(2)(RE=Pr,Nd,Ce)相的比例,且Ga的添加促使REFe_(2)相中的Ce元素浓度下降,形成高Pr、高Nd、低Ce浓度的REFe_(2)相,其在吸收二粒子晶界(相邻主相晶界)中Fe元素的同时使更多Ce元素从高Pr、高Nd、低Ce浓度的REFe_(2)相的晶界三角区(TJP)进入到二粒子晶界中,形成更加连续光滑的二粒子晶界;与此同时,Ga的掺杂还可减少富稀土相的聚集,降低晶界共晶的熔点温度,起到润湿晶界的作用。两者的协同作用调控了磁体内部的微观结构,形成了更加光滑连续的晶界,进一步增强了主相晶粒间的去磁耦合,从而提升磁体的矫顽力。 展开更多
关键词 ga添加 (Nd Ce)-Fe-B磁体 晶界相调控 REFe_(2)相
在线阅读 下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:5
15
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 multi-agent reinforcement learning evolutionary game Q-LEARNING
在线阅读 下载PDF
Collaborative multi-agent reinforcement learning based on experience propagation 被引量:5
16
作者 Min Fang Frans C.A. Groen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第4期683-689,共7页
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c... For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance. 展开更多
关键词 multi-agent Q learning state list extracting experience sharing.
在线阅读 下载PDF
Multi-agent system application in accordance with game theory in bi-directional coordination network model 被引量:3
17
作者 ZHANG Jie WANG Gang +3 位作者 YUE Shaohua SONG Yafei LIU Jiayi YAO Xiaoqiang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第2期279-289,共11页
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom... The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values. 展开更多
关键词 LOYALTY gaME theory bi-directional COORDINATION network multi-agent system learning STRATEGY
在线阅读 下载PDF
A Survey of Cooperative Multi-agent Reinforcement Learning for Multi-task Scenarios 被引量:1
18
作者 Jiajun CHAI Zijie ZHAO +1 位作者 Yuanheng ZHU Dongbin ZHAO 《Artificial Intelligence Science and Engineering》 2025年第2期98-121,共24页
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-... Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world. 展开更多
关键词 MULTI-TASK multi-agent reinforcement learning large language models
在线阅读 下载PDF
Consensus Control With a Constant Gain for Discrete-time Binary-valued Multi-agent Systems Based on a Projected Empirical Measure Method 被引量:6
19
作者 Ting Wang Min Hu Yanlong Zhao 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第4期1052-1059,共8页
This paper studies the consensus control of multiagent systems with binary-valued observations.An algorithm alternating estimation and control is proposed.Each agent estimates the states of its neighbors based on a pr... This paper studies the consensus control of multiagent systems with binary-valued observations.An algorithm alternating estimation and control is proposed.Each agent estimates the states of its neighbors based on a projected empirical measure method for a holding time.Based on the estimates,each agent designs the consensus control with a constant gain at some skipping time.The states of the system are updated by the designed control,and the estimation and control design will be repeated.For the estimation,the projected empirical measure method is proposed for the binary-valued observations.The algorithm can ensure the uniform boundedness of the estimates and the mean square error of the estimation is proved to be at the order of the reciprocal of the holding time(the same order as that in the case of accurate outputs).For the consensus control,a constant gain is designed instead of the stochastic approximation based gain in the existing literature for binary-valued observations.And,there is no need to make modification for control since the uniform boundedness of the estimates ensures the uniform boundedness of the agents’states.Finally,the systems updated by the designed control are proved to achieve consensus and the consensus speed is faster than that in the existing literature.Simulations are given to demonstrate the theoretical results. 展开更多
关键词 Binary-valued observations CONSENSUS CONTROL CONSTANT gaIN convergence rate multi-agent systems projected empirical measure method
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部