期刊文献+
共找到283,881篇文章
< 1 2 250 >
每页显示 20 50 100
Mixture of Experts Framework Based on Soft Actor-Critic Algorithm for Highway Decision-Making of Connected and Automated Vehicles
1
作者 Fuxing Yao Chao Sun +2 位作者 Bing Lu Bo Wang Haiyang Yu 《Chinese Journal of Mechanical Engineering》 2025年第1期382-395,共14页
Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements... Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements.Deep reinforcement learning(DRL)exhibits excellent capability of real-time decision-making and adaptability to complex scenarios,and generalization abilities.However,it is arduous to guarantee complete driving safety and efficiency under the constraints of training samples and costs.This paper proposes a Mixture of Expert method(MoE)based on Soft Actor-Critic(SAC),where the upper-level discriminator dynamically decides whether to activate the lower-level DRL expert or the heuristic expert based on the features of the input state.To further enhance the performance of the DRL expert,a buffer zone is introduced in the reward function,preemptively applying penalties before insecure situations occur.In order to minimize collision and off-road rates,the Intelligent Driver Model(IDM)and Minimizing Overall Braking Induced by Lane changes(MOBIL)strategy are designed by heuristic experts.Finally,tested in typical simulation scenarios,MOE shows a 13.75%improvement in driving efficiency compared with the traditional DRL method with continuous action space.It ensures high safety with zero collision and zero off-road rates while maintaining high adaptability. 展开更多
关键词 DECISION-MAKING Soft actor-critic Connected and automated vehicles
在线阅读 下载PDF
A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory 被引量:1
2
作者 Bao Xi Rui Wang +2 位作者 Ying-Hao Cai Tao Lu Shuo Wang 《International Journal of Automation and computing》 EI CSCD 2021年第4期619-631,共13页
Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, w... Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method. 展开更多
关键词 Reinforcement learning(RL) actor-critic experience replay training efficiency manipulation skill learning
原文传递
Optimal Power Dispatch of Active Distribution Network and P2P Energy Trading Based on Soft Actor-critic Algorithm Incorporating Distributed Trading Control
3
作者 Yongjun Zhang Jun Zhang +3 位作者 Guangbin Wu Jiehui Zheng Dongming Liu Yuzheng An 《Journal of Modern Power Systems and Clean Energy》 2025年第2期540-551,共12页
Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the powe... Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests.Hence,this paper proposes a soft actor-critic algorithm incorporating distributed trading control(SAC-DTC)to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers.First,the soft actor-critic(SAC)algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost,and the primary environmental information of the ADN at this point is published to prosumers.Then,a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues.Subsequently,the results of trading are encrypted based on the differential privacy technique and returned to the ADN.Finally,the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning.Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost,boosts the P2P market revenue,maximizes the social welfare,and exhibits high computational accuracy,demonstrating its practical application to the operation of power systems and power markets. 展开更多
关键词 Optimal power dispatch peer-to-peer(P2P)energy trading active distribution network(ADN) distributed trading soft actor-critic algorithm privacy preservation
原文传递
A Hybrid Data-driven Approach Integrating Temporal Fusion Transformer and Soft Actor-critic Algorithm for Optimal Scheduling of Building Integrated Energy Systems
4
作者 Ze Hu Peijun Zheng +4 位作者 Ka Wing Chan Siqi Bu Ziqing Zhu Xiang Wei Yosuke Nakanishi 《Journal of Modern Power Systems and Clean Energy》 2025年第3期878-891,共14页
Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency... Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency mainly lie in the renewable generation uncertainty and operational non-convexity of combined heat and power(CHP)units.To this end,this paper proposes a soft actor-critic(SAC)algorithm to solve the scheduling problem of BIES,which overcomes the model non-convexity and shows advantages in robustness and generalization.This paper also adopts a temporal fusion transformer(TFT)to enhance the optimal solution for the SAC algorithm by forecasting the renewable generation and energy demand.The TFT can effectively capture the complex temporal patterns and dependencies that span multiple steps.Furthermore,its forecasting results are interpretable due to the employment of a self-attention layer so as to assist in more trustworthy decision-making in the SAC algorithm.The proposed hybrid data-driven approach integrating TFT and SAC algorithm,i.e.,TFT-SAC approach,is trained and tested on a real-world dataset to validate its superior performance in reducing the energy cost and computational time compared with the benchmark approaches.The generalization performance for the scheduling policy,as well as the sensitivity analysis,are examined in the case studies. 展开更多
关键词 Building integrated energy system(BIES) hybrid data-driven approach time-series forecast optimal scheduling soft actor-critic(SAC) temporal fusion transformer(TFT)
原文传递
基于Actor-Critic算法的新能源汽车实时充电优化调度研究
5
作者 赖城贤 杨婷 苏庆列 《黑龙江工业学院学报(综合版)》 2025年第5期128-133,共6页
随着新能源汽车的普及,其充电调度问题日益凸显。研究旨在通过优化充电调度算法,实现新能源汽车充电的实时优化,以提升充电效率和降低成本。研究采用了分两步执行的Actor-Critic充电调度算法,利用多层感知器构建Actor和Critic网络,并通... 随着新能源汽车的普及,其充电调度问题日益凸显。研究旨在通过优化充电调度算法,实现新能源汽车充电的实时优化,以提升充电效率和降低成本。研究采用了分两步执行的Actor-Critic充电调度算法,利用多层感知器构建Actor和Critic网络,并通过并行计算提高算法效率。研究结果显示,该算法在精准率上迅速上升,在约200次迭代后达到0.9,显著优于其他算法。在运行时间方面,该算法始终保持较低水平,显示出高运行效率。在充电负载管理上,该算法在50小时内达到约45kW的负载,充电效率接近90%,且充电成本在所有车辆数量下均为最低。该算法在新能源汽车充电调度中表现出色,不仅提高了充电效率,降低了充电成本,而且具有较快的收敛速度和较低的运行时间,为新能源汽车充电调度提供了一种有效的解决方案。 展开更多
关键词 actor-critic算法 新能源汽车 实时充电 优化调度 状态空间
在线阅读 下载PDF
Actor-Critic框架下基于DDPG算法的绘画机器人控制系统优化设计 被引量:2
6
作者 罗子彪 唐娇 《自动化与仪器仪表》 2025年第2期193-197,202,共6页
人工智能与艺术创作的碰撞成为当前研究新焦点。然而,机器人在进行图画绘制工作中的控制效果却难以满足精度需求。因此,研究在深度确定性策略梯度算法基础上进行了绘画机器人控制系统设计。在Actor网络和Critic网络框架下,对算法的奖励... 人工智能与艺术创作的碰撞成为当前研究新焦点。然而,机器人在进行图画绘制工作中的控制效果却难以满足精度需求。因此,研究在深度确定性策略梯度算法基础上进行了绘画机器人控制系统设计。在Actor网络和Critic网络框架下,对算法的奖励函数以及经验池进行改进与优化,并提出了绘画机器人控制系统。验证显示,研究提出的控制系统比其他算法基础上的控制系统训练收敛速度平均提高了38.04%。机械臂肘关节仿真误差比其他算法平均减少了93.74%。结果表明,对算法的奖励函数与经验池进行改进能够提高算法收敛速度与性能。研究提出的绘画机器人控制系统对机器人绘制图像的过程控制能够满足控制精度需求,在机器人控制中具有积极的应用价值。 展开更多
关键词 Actor网络 Critic网络 DDPG算法 深度强化学习 控制系统
原文传递
面向长序列自主作业的非对称Actor-Critic强化学习方法
7
作者 任君凯 瞿宇珂 +3 位作者 罗嘉威 倪子淇 卢惠民 叶益聪 《国防科技大学学报》 北大核心 2025年第4期111-122,共12页
长序列自主作业能力已成为制约智能机器人走向实际应用的问题之一。针对机器人在复杂场景中面临的多样化长序列操作技能需求,提出了一种高效鲁棒的非对称Actor-Critic强化学习方法,旨在解决长序列任务学习难度大与奖励函数设计复杂的挑... 长序列自主作业能力已成为制约智能机器人走向实际应用的问题之一。针对机器人在复杂场景中面临的多样化长序列操作技能需求,提出了一种高效鲁棒的非对称Actor-Critic强化学习方法,旨在解决长序列任务学习难度大与奖励函数设计复杂的挑战。通过整合多个Critic网络协同训练单一Actor网络,并引入生成对抗模仿学习为Critic网络生成内在奖励,从而降低长序列任务学习难度。在此基础上,设计两阶段学习方法,利用模仿学习为强化学习提供高质量预训练行为策略,在进一步提高学习效率的同时,增强策略的泛化性能。面向化学实验室长序列自主作业的仿真结果表明,该方法显著提高了机器人长序列操作技能的学习效率与行为策略的鲁棒性。 展开更多
关键词 自主作业机器人 强化学习 actor-critic 长序列操作
在线阅读 下载PDF
基于Actor-Critic算法的无人机集群任务分配方法
8
作者 苏瑞 龚俊 张鸿宇 《兵工自动化》 北大核心 2025年第5期107-112,共6页
为最小化无人机集群任务分配中任务的完成总时间和总航程,提出一种基于Actor-Critic算法的优化方法。利用Actor-Critic算法中的Actor网络,根据当前状态生成任务分配策略,并用Critic网络评估Actor网络生成的策略价值。采用多阶时序差分误... 为最小化无人机集群任务分配中任务的完成总时间和总航程,提出一种基于Actor-Critic算法的优化方法。利用Actor-Critic算法中的Actor网络,根据当前状态生成任务分配策略,并用Critic网络评估Actor网络生成的策略价值。采用多阶时序差分误差,结合多个时间步的奖励来更新策略,以提高学习效率并减少延迟奖励。在多种任务场景下,通过仿真实验进行对比。仿真结果表明:该方法能够显著减少任务完成时间和航行距离,验证了在任务分配问题上的有效性。 展开更多
关键词 无人机集群 任务分配 强化学习 演员评论家算法
在线阅读 下载PDF
PID Steering Control Method of Agricultural Robot Based on Fusion of Particle Swarm Optimization and Genetic Algorithm
9
作者 ZHAO Longlian ZHANG Jiachuang +2 位作者 LI Mei DONG Zhicheng LI Junhui 《农业机械学报》 北大核心 2026年第1期358-367,共10页
Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion... Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion algorithm took advantage of the fast optimization ability of PSO to optimize the population screening link of GA.The Simulink simulation results showed that the convergence of the fitness function of the fusion algorithm was accelerated,the system response adjustment time was reduced,and the overshoot was almost zero.Then the algorithm was applied to the steering test of agricultural robot in various scenes.After modeling the steering system of agricultural robot,the steering test results in the unloaded suspended state showed that the PID control based on fusion algorithm reduced the rise time,response adjustment time and overshoot of the system,and improved the response speed and stability of the system,compared with the artificial trial and error PID control and the PID control based on GA.The actual road steering test results showed that the PID control response rise time based on the fusion algorithm was the shortest,about 4.43 s.When the target pulse number was set to 100,the actual mean value in the steady-state regulation stage was about 102.9,which was the closest to the target value among the three control methods,and the overshoot was reduced at the same time.The steering test results under various scene states showed that the PID control based on the proposed fusion algorithm had good anti-interference ability,it can adapt to the changes of environment and load and improve the performance of the control system.It was effective in the steering control of agricultural robot.This method can provide a reference for the precise steering control of other robots. 展开更多
关键词 agricultural robot steering PID control particle swarm optimization algorithm genetic algorithm
在线阅读 下载PDF
TWO PARALLEL ALGORITHMS FOR A CLASS OF SPLIT COMMON SOLUTION PROBLEMS
10
作者 Truong Minh TUYEN Nguyen Thi TRANG Tran Thi HUONG 《Acta Mathematica Scientia》 2026年第1期505-518,共14页
We study the split common solution problem with multiple output sets for monotone operator equations in Hilbert spaces.To solve this problem,we propose two new parallel algorithms.We establish a weak convergence theor... We study the split common solution problem with multiple output sets for monotone operator equations in Hilbert spaces.To solve this problem,we propose two new parallel algorithms.We establish a weak convergence theorem for the first and a strong convergence theorem for the second. 展开更多
关键词 iterative algorithm Hilbert space metric projection proximal point algorithm
在线阅读 下载PDF
An Eulerian-Lagrangian parallel algorithm for simulation of particle-laden turbulent flows
11
作者 Harshal P.Mahamure Deekshith I.Poojary +1 位作者 Vagesh D.Narasimhamurthy Lihao Zhao 《Acta Mechanica Sinica》 2026年第1期15-34,共20页
This paper presents an Eulerian-Lagrangian algorithm for direct numerical simulation(DNS)of particle-laden flows.The algorithm is applicable to perform simulations of dilute suspensions of small inertial particles in ... This paper presents an Eulerian-Lagrangian algorithm for direct numerical simulation(DNS)of particle-laden flows.The algorithm is applicable to perform simulations of dilute suspensions of small inertial particles in turbulent carrier flow.The Eulerian framework numerically resolves turbulent carrier flow using a parallelized,finite-volume DNS solver on a staggered Cartesian grid.Particles are tracked using a point-particle method utilizing a Lagrangian particle tracking(LPT)algorithm.The proposed Eulerian-Lagrangian algorithm is validated using an inertial particle-laden turbulent channel flow for different Stokes number cases.The particle concentration profiles and higher-order statistics of the carrier and dispersed phases agree well with the benchmark results.We investigated the effect of fluid velocity interpolation and numerical integration schemes of particle tracking algorithms on particle dispersion statistics.The suitability of fluid velocity interpolation schemes for predicting the particle dispersion statistics is discussed in the framework of the particle tracking algorithm coupled to the finite-volume solver.In addition,we present parallelization strategies implemented in the algorithm and evaluate their parallel performance. 展开更多
关键词 DNS Eulerian-Lagrangian Particle tracking algorithm Point-particle Parallel software
原文传递
Equivalent Modeling with Passive Filter Parameter Clustering for Photovoltaic Power Stations Based on a Particle Swarm Optimization K-Means Algorithm
12
作者 Binjiang Hu Yihua Zhu +3 位作者 Liang Tu Zun Ma Xian Meng Kewei Xu 《Energy Engineering》 2026年第1期431-459,共29页
This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the compl... This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research. 展开更多
关键词 Photovoltaic power station multi-machine equivalentmodeling particle swarmoptimization K-means clustering algorithm
在线阅读 下载PDF
GSLDWOA: A Feature Selection Algorithm for Intrusion Detection Systems in IIoT
13
作者 Wanwei Huang Huicong Yu +3 位作者 Jiawei Ren Kun Wang Yanbu Guo Lifeng Jin 《Computers, Materials & Continua》 2026年第1期2006-2029,共24页
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from... Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%. 展开更多
关键词 Industrial Internet of Things intrusion detection system feature selection whale optimization algorithm Gaussian mutation
在线阅读 下载PDF
Identification of small impact craters in Chang’e-4 landing areas using a new multi-scale fusion crater detection algorithm
14
作者 FangChao Liu HuiWen Liu +7 位作者 Li Zhang Jian Chen DiJun Guo Bo Li ChangQing Liu ZongCheng Ling Ying-Bo Lu JunSheng Yao 《Earth and Planetary Physics》 2026年第1期92-104,共13页
Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious an... Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy. 展开更多
关键词 impact craters Chang’e-4 landing area multi-scale automatic detection YOLO11 Fusion algorithm
在线阅读 下载PDF
Actor-Critic框架下的数据驱动异步电机离线参数辨识方法 被引量:11
15
作者 漆星 张倩 《电工技术学报》 EI CSCD 北大核心 2019年第9期1875-1885,共11页
电动汽车用电机的参数辨识可以使电机在任意转速下尽可能输出更高的转矩及效率,是优化电机输出性能的重要手段。传统的基于模型驱动的参数辨识方法的缺点为易受模型误差的影响、抗干扰能力差以及无法实现全转速范围内的转矩最优。鉴于... 电动汽车用电机的参数辨识可以使电机在任意转速下尽可能输出更高的转矩及效率,是优化电机输出性能的重要手段。传统的基于模型驱动的参数辨识方法的缺点为易受模型误差的影响、抗干扰能力差以及无法实现全转速范围内的转矩最优。鉴于上述缺点,该文研究了一种完全基于实际数据的电动汽车用异步电机离线参数辨识方法,对电机的转子电阻和励磁电感在任意转速下进行了优化,从而使电机能够在特定转速和特定电流下输出最优转矩。为达到电机在特定转速和电流下输出转矩最优的目的,研究了一种基于Actor-Critic框架的电动汽车用异步电机离线参数辨识方法,确定了框架中的观测、奖励和动作的设计。实验证明相对于传统参数辨识方法,该文方法具有更高的精确性和鲁棒性,同时确保了电动汽车用异步电机在任意转速下的输出转矩最优。 展开更多
关键词 异步电机参数辨识数据驱动actor-critic 框架
在线阅读 下载PDF
基于改进Actor-Critic算法的多传感器交叉提示技术 被引量:2
16
作者 韦道知 张曌宇 +1 位作者 谢家豪 李宁 《系统工程与电子技术》 EI CSCD 北大核心 2023年第6期1624-1632,共9页
针对在减少战场资源浪费、平衡战场效费比的同时提高目标探测概率,保证目标的可持续跟踪,提出利用改进Actor-Critic算法的多传感器交叉提示技术进行目标探测。首先,综合传感器探测、能耗、时效等因素搭建基于“交叉提示”传感器的动态... 针对在减少战场资源浪费、平衡战场效费比的同时提高目标探测概率,保证目标的可持续跟踪,提出利用改进Actor-Critic算法的多传感器交叉提示技术进行目标探测。首先,综合传感器探测、能耗、时效等因素搭建基于“交叉提示”传感器的动态管理评估模型;其次,重点分析利用Actor-Critic交叉提示算法的传感器管理决策规则,并且提出了Actor-Critic算法,以根据任务自身需求组建中央评价网络,加大传感器与外部环境的交互。仿真结果表明,改进的算法可以加速网络收益,实现对目标的持续性探测,加强传感器之间的交叉提示功能,提升调度的智能化水平,具有较大的应用价值。 展开更多
关键词 多传感器交叉提示 actor-critic算法 强化学习 目标探测 传感器资源调度
在线阅读 下载PDF
基于对称扰动采样的Actor-critic算法 被引量:2
17
作者 张春元 朱清新 《控制与决策》 EI CSCD 北大核心 2015年第12期2161-2167,共7页
针对传统Actor-critic(AC)方法在求解连续空间序贯决策问题时收敛速度较慢、收敛质量不高的问题,提出一种基于对称扰动采样的AC算法框架.首先,框架采用高斯分布作为策略分布,在每一时间步对当前动作均值对称扰动,从而生成两个动作与环... 针对传统Actor-critic(AC)方法在求解连续空间序贯决策问题时收敛速度较慢、收敛质量不高的问题,提出一种基于对称扰动采样的AC算法框架.首先,框架采用高斯分布作为策略分布,在每一时间步对当前动作均值对称扰动,从而生成两个动作与环境并行交互;然后,基于两者的最大时域差分(TD)误差选取Agent的行为动作,并对值函数参数进行更新;最后,基于两者的平均常规梯度或增量自然梯度对策略参数进行更新.理论分析和仿真结果表明,所提框架具有较好的收敛性和计算效率. 展开更多
关键词 actor-critic方法 对称扰动采样 连续空间 强化学习
原文传递
基于Actor-Critic算法的多无人机协同空战目标重分配方法 被引量:4
18
作者 陈宇轩 王国强 +1 位作者 罗贺 马滢滢 《无线电工程》 北大核心 2022年第7期1266-1275,共10页
目标重分配问题是多无人机协同空战中亟需解决的关键问题之一。考虑到空战中的不确定性、实时性等特点,建立了多无人机协同空战目标重分配问题的数学模型,结合强化学习核心概念,提出了基于Actor-Critic算法的多无人机协同空战目标重分... 目标重分配问题是多无人机协同空战中亟需解决的关键问题之一。考虑到空战中的不确定性、实时性等特点,建立了多无人机协同空战目标重分配问题的数学模型,结合强化学习核心概念,提出了基于Actor-Critic算法的多无人机协同空战目标重分配框架,构建了基于目标重分配的马尔科夫决策过程、Actor网络结构和Critic网络结构。针对强化学习算法中存在的奖励稀疏问题,设计了局部回报和全局汇报相结合的双层回报函数。在基于VR-Forces仿真平台中验证了该方法的有效性。实验结果表明,提出的多无人机协同空战目标重分配方法能够有效地提升空战对抗的胜率。 展开更多
关键词 无人机 空战 目标重分配 强化学习 actor-critic算法
在线阅读 下载PDF
基于actor-critic算法的分数阶多自主体系统最优主-从一致性控制 被引量:5
19
作者 马丽新 刘晨 刘磊 《应用数学和力学》 CSCD 北大核心 2022年第1期104-114,共11页
研究了分数阶多自主体系统的最优主-从一致性问题.在考虑控制器周期间歇的前提下,将分数阶微分的一阶近似逼近式、事件触发机制和强化学习中的actor-critic算法有机整合,设计了基于周期间歇事件触发策略的强化学习算法结构.最后,通过数... 研究了分数阶多自主体系统的最优主-从一致性问题.在考虑控制器周期间歇的前提下,将分数阶微分的一阶近似逼近式、事件触发机制和强化学习中的actor-critic算法有机整合,设计了基于周期间歇事件触发策略的强化学习算法结构.最后,通过数值仿真实验证明了该算法的可行性和有效性. 展开更多
关键词 分数阶多自主体系统 actor-critic算法 最优主-从一致性 事件触发 间歇
在线阅读 下载PDF
基于Tile Coding编码和模型学习的Actor-Critic算法 被引量:3
20
作者 金玉净 朱文文 +1 位作者 伏玉琛 刘全 《计算机科学》 CSCD 北大核心 2014年第6期239-242,249,共5页
Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状... Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状态和动作的编码方法以及参数对Actor-Critic方法有重要的影响。Tile Coding编码具有简单易用、计算时间复杂度较低等优点,因此,将Tile Coding编码与基于模型的Actor-Critic方法结合,并将所得算法应用于强化学习仿真实验。实验结果表明,所得算法具有较好的性能。 展开更多
关键词 强化学习 TILE CODING actor-critic 模型学习 函数逼近
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部