期刊文献+
共找到224篇文章
< 1 2 12 >
每页显示 20 50 100
A Deep Reinforcement Learning-Based Partitioning Method for Power System Parallel Restoration
1
作者 Changcheng Li Weimeng Chang +1 位作者 Dahai Zhang Jinghan He 《Energy Engineering》 2026年第1期243-264,共22页
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision... Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training. 展开更多
关键词 Partitioning method parallel restoration deep reinforcement learning experience replay buffer partitioning modularity
在线阅读 下载PDF
An Automatic Damage Detection Method Based on Adaptive Theory-Assisted Reinforcement Learning
2
作者 Chengwen Zhang Qing Chun Yijie Lin 《Engineering》 2025年第7期188-202,共15页
Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations.The aim of this paper is to construct a real... Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations.The aim of this paper is to construct a real-time automated damage detection method by developing a theory-assisted adaptive mutiagent twin delayed deep deterministic(TA2-MATD3)policy gradient algorithm.First,the theoretical framework of reinforcement-learning-driven damage detection is established.To address the disadvantages of traditional mutiagent twin delayed deep deterministic(MATD3)method,the theory-assisted mechanism and the adaptive experience playback mechanism are introduced.Moreover,a historical residential house built in 1889 was taken as an example,using its 12-month structural health monitoring data.TA2-MATD3 was compared with existing damage detection methods in terms of the convergence ratio,online computing efficiency,and damage detection accuracy.The results show that the computational efficiency of TA2-MATD3 is approximately 117–160 times that of the traditional methods.The convergence ratio of damage detection on the training set is approximately 97%,and that on the test set is in the range of 86.2%–91.9%.In addition,the main apparent damages found in the field survey were identified by TA2-MATD3.The results indicate that the proposed method can significantly improve the online computing efficiency and damage detection accuracy.This research can provide novel perspectives for the use of reinforcement learning methods to conduct damage detection in online structural health monitoring. 展开更多
关键词 reinforcement learning Theory-assisted Damage detection Newton’s method Model updating Architectural heritage
在线阅读 下载PDF
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:24
3
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) Deep Deterministic Policy Gradient(DDPG) Deep reinforcement learning(DRL) Sampling method UAV
原文传递
Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning 被引量:3
4
作者 Yongliang Yang Zihao Ding +2 位作者 Rui Wang Hamidreza Modares Donald C.Wunsch 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第1期47-63,共17页
In this paper,we present a novel data-driven design method for the human-robot interaction(HRI)system,where a given task is achieved by cooperation between the human and the robot.The presented HRI controller design i... In this paper,we present a novel data-driven design method for the human-robot interaction(HRI)system,where a given task is achieved by cooperation between the human and the robot.The presented HRI controller design is a two-level control design approach consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design.The task-oriented design minimizes the human effort and guarantees the perfect task tracking in the outer-loop,while the plant-oriented achieves the desired impedance from the human to the robot manipulator end-effector in the inner-loop.Data-driven reinforcement learning techniques are used for performance optimization in the outer-loop to assign the optimal impedance parameters.In the inner-loop,a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement.On this basis,an adaptive controller is designed to achieve the desired impedance of the robot manipulator in the task space.The simulation and experiment of a robot manipulator are conducted to verify the efficacy of the presented HRI design framework. 展开更多
关键词 Adaptive impedance control data-driven method human-robot interaction(HRI) reinforcement learning velocity-free
在线阅读 下载PDF
A new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning 被引量:3
5
作者 Wendi Chen Qinglai Wei 《Journal of Automation and Intelligence》 2024年第1期34-39,共6页
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy... In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy. 展开更多
关键词 Nonlinear systems reinforcement learning Optimal control Backstepping method
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks 被引量:1
6
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
在线阅读 下载PDF
Structural Topology Optimization by Combining BESO with Reinforcement Learning 被引量:1
7
作者 Hongbo Sun Ling Ma 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2021年第1期85-96,共12页
In this paper,a new algorithm combining the features of bi-direction evolutionary structural optimization(BESO)and reinforcement learning(RL)is proposed for continuum structural topology optimization(STO).In contrast ... In this paper,a new algorithm combining the features of bi-direction evolutionary structural optimization(BESO)and reinforcement learning(RL)is proposed for continuum structural topology optimization(STO).In contrast to conventional approaches which only generate a certain quasi-optimal solution,the goal of the combined method is to provide more quasi-optimal solutions for designers such as the idea of generative design.Two key components were adopted.First,besides sensitivity,value function updated by Monte-Carlo reinforcement learning was utilized to measure the importance of each element,which made the solving process convergent and closer to the optimum.Second,ε-greedy policy added a random perturbation to the main search direction so as to extend the search ability.Finally,the quality and diversity of solutions could be guaranteed by controlling the value of compliance as well as Intersection-over-Union(IoU).Results of several 2D and 3D compliance minimization problems,including a geometrically nonlinear case,show that the combined method is capable of generating a group of good and different solutions that satisfy various possible requirements in engineering design within acceptable computation cost. 展开更多
关键词 structural topology optimization bi-direction evolutionary structural optimization reinforcement learning first-visit Monte-Carlo method ε-greedy policy generative design
在线阅读 下载PDF
A Reinforcement Learning System to Dynamic Movement and Multi-Layer Environments
8
作者 Uthai Phommasak Daisuke Kitakoshi +1 位作者 Hiroyuki Shioya Junji Maeda 《Journal of Intelligent Learning Systems and Applications》 2014年第4期176-185,共10页
There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesi... There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesian Networks, Mixture Probability and Clustering Distribution, etc. However such methods give rise to the increase of the computational complexity. For another method, the adaptation performance to more complex environments such as multi-layer environments is required. In this study, we used profit-sharing method for the agent to learn its policy, and added a mixture probability into the RL system to recognize changes in the environment and appropriately improve the agent’s policy to adjust to the changing environment. We also introduced a clustering that enables a smaller, suitable selection in order to reduce the computational complexity and simultaneously maintain the system’s performance. The results of experiments presented that the agent successfully learned the policy and efficiently adjusted to the changing in multi-layer environment. Finally, the computational complexity and the decline in effectiveness of the policy improvement were controlled by using our proposed system. 展开更多
关键词 reinforcement learning PROFIT-SHARING method MIXTURE PROBABILITY CLUSTERING
暂未订购
An intelligent drilling guide algorithm design framework based on highly interactive learning mechanism
9
作者 Yi Zhao Dan-Dan Zhu +3 位作者 Fei Wang Xin-Ping Dai Hui-Shen Jiao Zi-Jie Zhou 《Petroleum Science》 2025年第8期3333-3343,共11页
Measurement-while-drilling(MWD)and guidance technologies have been extensively deployed in the exploitation of oil,natural gas,and other energy resources.Conventional control approaches are plagued by challenges,inclu... Measurement-while-drilling(MWD)and guidance technologies have been extensively deployed in the exploitation of oil,natural gas,and other energy resources.Conventional control approaches are plagued by challenges,including limited anti-interference capabilities and the insufficient generalization of decision-making experience.To address the intricate problem of directional well trajectory control,an intelligent algorithm design framework grounded in the high-level interaction mechanism between geology and engineering is put forward.This framework aims to facilitate the rapid batch migration and update of drilling strategies.The proposed directional well trajectory control method comprehensively considers the multi-source heterogeneous attributes of drilling experience data,leverages the generative simulation of the geological drilling environment,and promptly constructs a directional well trajectory control model with self-adaptive capabilities to environmental variations.This construction is carried out based on three hierarchical levels:“offline pre-drilling learning,online during-drilling interaction,and post-drilling model transfer”.Simulation results indicate that the guidance model derived from this method demonstrates remarkable generalization performance and accuracy.It can significantly boost the adaptability of the control algorithm to diverse environments and enhance the penetration rate of the target reservoir during drilling operations. 展开更多
关键词 Highly interactive decision algorithm Borehole guidance Intelligent control method reinforcement learning Rapid perception Well drilling simulation
原文传递
Spatiotemporal-enhanced deep reinforcement learning for multi-UAV target coverage with connectivity in no-fly zones
10
作者 Hanxiao LIU Tianlong WAN +2 位作者 Xu FANG Xiaoqiang REN Yan PENG 《Science China(Technological Sciences)》 2026年第1期178-180,共3页
In unmanned aerial vehicle(UAV)applications,efficient multi-target coverage with reliable connectivity is critical for reconnaissance,search and rescue,and environmental monitoring[1].However,real-world deployments fa... In unmanned aerial vehicle(UAV)applications,efficient multi-target coverage with reliable connectivity is critical for reconnaissance,search and rescue,and environmental monitoring[1].However,real-world deployments face two major challenges:restricted airspace(no-fly zones,NFZs)that constrain trajectories,and limited communication ranges that require team connectivity[2].Existing potential field,geometric,or decentralized connectivity methods address these objectives separately[3–5],but they struggle with scalability and fail to ensure safe and efficient coverage in dynamic NFZ environments. 展开更多
关键词 team connectivity existing SPATIOTEMPORAL connectivity methods environmental monitoring howeverreal world deep reinforcement learning reconnaissancesearch rescueand target coverage multi uav
原文传递
离散四水库问题基准下基于n步Q-learning的水库群优化调度 被引量:6
11
作者 胡鹤轩 钱泽宇 +1 位作者 胡强 张晔 《中国水利水电科学研究院学报(中英文)》 北大核心 2023年第2期138-147,共10页
水库优化调度问题是一个具有马尔可夫性的优化问题。强化学习是目前解决马尔可夫决策过程问题的研究热点,其在解决单个水库优化调度问题上表现优异,但水库群系统的复杂性为强化学习的应用带来困难。针对复杂的水库群优化调度问题,提出... 水库优化调度问题是一个具有马尔可夫性的优化问题。强化学习是目前解决马尔可夫决策过程问题的研究热点,其在解决单个水库优化调度问题上表现优异,但水库群系统的复杂性为强化学习的应用带来困难。针对复杂的水库群优化调度问题,提出一种离散四水库问题基准下基于n步Q-learning的水库群优化调度方法。该算法基于n步Q-learning算法,对离散四水库问题基准构建一种水库群优化调度的强化学习模型,通过探索经验优化,最终生成水库群最优调度方案。试验分析结果表明,当有足够的探索经验进行学习时,结合惩罚函数的一步Q-learning算法能够达到理论上的最优解。用可行方向法取代惩罚函数实现约束,依据离散四水库问题基准约束建立时刻可行状态表和时刻状态可选动作哈希表,有效的对状态动作空间进行降维,使算法大幅度缩短优化时间。不同的探索策略决定探索经验的有效性,从而决定优化效率,尤其对于复杂的水库群优化调度问题,提出了一种改进的ε-greedy策略,并与传统的ε-greedy、置信区间上限UCB、Boltzmann探索三种策略进行对比,验证了其有效性,在其基础上引入n步回报改进为n步Q-learning,确定合适的n步和学习率等超参数,进一步改进算法优化效率。 展开更多
关键词 水库优化调度 强化学习 Q学习 惩罚函数 可行方向法
在线阅读 下载PDF
改进Q-Learning算法在路径规划中的应用 被引量:22
12
作者 高乐 马天录 +1 位作者 刘凯 张宇轩 《吉林大学学报(信息科学版)》 CAS 2018年第4期439-443,共5页
针对Q-Learning算法在离散状态下存在运行效率低、学习速度慢等问题,提出一种改进的Q-Learning算法。改进后的算法在原有算法基础上增加了一层学习过程,对环境进行了深度学习。在栅格环境下进行仿真实验,并成功地应用在多障碍物环境下... 针对Q-Learning算法在离散状态下存在运行效率低、学习速度慢等问题,提出一种改进的Q-Learning算法。改进后的算法在原有算法基础上增加了一层学习过程,对环境进行了深度学习。在栅格环境下进行仿真实验,并成功地应用在多障碍物环境下移动机器人路径规划,结果证明了算法的可行性。改进Q-Learning算法以更快的速度收敛,学习次数明显减少,效率最大可提高20%。同时,该算法框架对解决同类问题具有较强的通用性。 展开更多
关键词 路径规划 改进Q-learning算法 强化学习 栅格法 机器人
在线阅读 下载PDF
DRL-EnVar:an adaptive hybrid ensemble–variational data assimilation method based on deep reinforcement learning
13
作者 Lilan HUANG Hongze LENG +4 位作者 Junqiang SONG Dongzi WANG Wuxin WANG Ruisheng HU Hang CAO 《Frontiers of Information Technology & Electronic Engineering》 2025年第12期2583-2603,共21页
Accurate estimation of the background error covariance matrix denoted as B remains a critical challenge in numerical weather prediction(NWP),directly influencing data assimilation(DA)performance and forecast accuracy.... Accurate estimation of the background error covariance matrix denoted as B remains a critical challenge in numerical weather prediction(NWP),directly influencing data assimilation(DA)performance and forecast accuracy.Although hybrid ensemble–variational(En Var)methods combine static and flow-dependent matrices to improve assimilation,their effectiveness is constrained by empirically fixed weights.To address this limitation,we propose DRL-En Var,an adaptive hybrid En Var DA method enhanced with deep reinforcement learning.DRL-En Var integrates deep learning(DL)components,including a novel cyclic convolution module to extract abstract features from data,and employs reinforcement learning(RL)to dynamically optimize hybrid weighting strategies.The system adaptively combines multiple ensemble-based flow-dependent matrices with one or more static matrices to construct a time-varying hybrid matrix B that better reflects real-time background errors.Experimental results demonstrate that DRL-En Var performs better than the traditional ensemble Kalman filter(En KF)and hybrid covariance DA(HCDA)methods,especially under sparse observations or transitional changes in state variables.It achieves competitive or superior assimilation accuracy with lower computational cost,and can be flexibly integrated into both three-dimensional variational assimilation(3DVar)and four-dimensional variational assimilation(4DVar)frameworks.Overall,DRL-En Var offers a novel and efficient approach to adaptive DA,particularly valuable for improving forecast skill during transitional weather regimes. 展开更多
关键词 Adaptive data assimilation Hybrid ensemble–variational method Background error covariance Deep reinforcement learning
原文传递
Adaptive optimal tracking control for underactuated surface vessels using extended state observer and reinforcement learning
14
作者 Yinkun Li Yawen Zhou +1 位作者 Yufeng Zhou Li Chai 《Journal of Automation and Intelligence》 2026年第1期24-34,共11页
This paper investigates the adaptive optimal tracking control(AOTC)for underactuated surface vessels(USVs).Compared to the majority of existing studies,the control strategy in this paper innovatively combines an exten... This paper investigates the adaptive optimal tracking control(AOTC)for underactuated surface vessels(USVs).Compared to the majority of existing studies,the control strategy in this paper innovatively combines an extended state observer(ESO)with reinforcement learning(RL).The designed ESO has high estimation accuracy and robust disturbance rejection capabilities for the unmeasurable information for USVs.To obtain the AOTC,the actor–critic(AC)networks based on RL are constructed to solve the Hamilton–Jacobi–Bellman(HJB)equations.Due to the uncertainties,it is challenging to obtain the optimal controller by directly solving the HJB equations.To address this issue,this paper employs neural networks(NNs)to approximate the uncertainties and solves the optimal controller via AC-RL and ESO.In addition,the adaptive parameters of the optimal controller is trained in parallel with AC networks,which can ensure that the trained networks can further improve tracking performance.The boundedness of AOTC for USVs is shown by Lyapunov stability theorem.Finally,simulation results demonstrate the effectiveness of the proposed algorithm. 展开更多
关键词 Extended state observer Actor–critic networks reinforcement learning Backstepping method Underactuated surface vessel
在线阅读 下载PDF
基于改进Q-learning算法的移动机器人路径规划 被引量:4
15
作者 井征淼 刘宏杰 周永录 《火力与指挥控制》 CSCD 北大核心 2024年第3期135-141,共7页
针对传统Q-learning算法应用在路径规划中存在收敛速度慢、运行时间长、学习效率差等问题,提出一种将人工势场法和传统Q-learning算法结合的改进Q-learning算法。该算法引入人工势场法的引力函数与斥力函数,通过对比引力函数动态选择奖... 针对传统Q-learning算法应用在路径规划中存在收敛速度慢、运行时间长、学习效率差等问题,提出一种将人工势场法和传统Q-learning算法结合的改进Q-learning算法。该算法引入人工势场法的引力函数与斥力函数,通过对比引力函数动态选择奖励值,以及对比斥力函数计算姿值,动态更新Q值,使移动机器人具有目的性的探索,并且优先选择离障碍物较远的位置移动。通过仿真实验证明,与传统Q-learning算法、引入引力场算法对比,改进Q-learning算法加快了收敛速度,缩短了运行时间,提高了学习效率,降低了与障碍物相撞的概率,使移动机器人能够快速地找到一条无碰撞通路。 展开更多
关键词 移动机器人 路径规划 改进的Q-learning 人工势场法 强化学习
在线阅读 下载PDF
Clustered Reinforcement Learning
16
作者 Xiao MA Shen-Yi ZHAO +1 位作者 Zhao-Heng YIN Wu-Jun LI 《Frontiers of Computer Science》 2025年第4期43-57,共15页
Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse rewards.During exploration,the agent tries to discover unexplor... Exploration strategy design is a challenging problem in reinforcement learning(RL),especially when the environment contains a large state space or sparse rewards.During exploration,the agent tries to discover unexplored(novel)areas or high reward(quality)areas.Most existing methods perform exploration by only utilizing the novelty of states.The novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s exploration.To address this problem,this paper proposes a novel RL framework,called clustered reinforcement learning(CRL),for efficient exploration in RL.CRL adopts clustering to divide the collected states into several clusters,based on which a bonus reward reflecting both novelty and quality in the neighboring area(cluster)of the current state is given to the agent.CRL leverages these bonus rewards to guide the agent to perform efficient exploration.Moreover,CRL can be combined with existing exploration strategies to improve their performance,as the bonus rewards employed by these existing exploration strategies solely capture the novelty of states.Experiments on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance. 展开更多
关键词 deep reinforcement learning EXPLORATION count-based method CLUSTERING K-MEANS
原文传递
进口集装箱堆存决策的两阶段强化学习方法
17
作者 宋丽英 邓琨琦 +2 位作者 宁武 宋海涛 李四维 《交通运输系统工程与信息》 北大核心 2026年第1期283-294,共12页
进口集装箱堆存问题因卸船顺序与提箱顺序的矛盾以及堆场资源约束而呈现高度复杂性。针对这一挑战,本文面向自动化垂直布局堆场,提出一种基于深度强化学习的两阶段堆存决策方法。该方法将堆存过程建模为马尔可夫决策过程,在框架上引入... 进口集装箱堆存问题因卸船顺序与提箱顺序的矛盾以及堆场资源约束而呈现高度复杂性。针对这一挑战,本文面向自动化垂直布局堆场,提出一种基于深度强化学习的两阶段堆存决策方法。该方法将堆存过程建模为马尔可夫决策过程,在框架上引入“堆区决策-堆位决策”的分阶段结构,有效降低状态与动作空间的维度,并结合差异化奖励函数,将均衡堆区利用率、翻箱次数和提箱移动距离作为优化目标。算法设计上,第1阶段采用深度Q网络(DQN)实现堆区选择,第2阶段引入对偶深度Q网络(Dueling DQN)提升复杂状态下的堆位选择效率。实验结果表明,该方法能够在全堆场范围内形成均衡的堆存策略:在不同堆场密度和集装箱批量场景下均表现出稳定适应性,平均翻箱率控制在15%~27%,平均移动贝位数最大值为3.84贝·箱^(-1),分别较实际数据降低约61.5%与38.7%。与单阶段DQN、两阶段近端策略优化(PPO)和启发式算法相比,本文方法在收敛效率、决策效果和鲁棒性方面均具有明显优势。本文不仅验证了分阶段建模与差异化奖励机制在复杂堆存问题中的有效性,还为大规模自动化堆场的调度与资源优化提供了具有推广性的解决方案。 展开更多
关键词 物流工程 堆存决策 强化学习 进口集装箱 两阶段方法
在线阅读 下载PDF
Optimal pivot path of the simplex method for linear programming based on reinforcement learning 被引量:2
18
作者 Anqi Li Tiande Guo +2 位作者 Congying Han Bonan Li Haoran Li 《Science China Mathematics》 SCIE CSCD 2024年第6期1263-1286,共24页
Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to... Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search.Specifically,we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables.Secondly,we propose four reinforcement learning models with two actions and two rewards to make the Monte Carlo tree search suitable for the simplex method.Thirdly,we set a new action selection criterion to ameliorate the inaccurate evaluation in the initial exploration.It is proved that when the number of vertices in the feasible region is C_(n)^(m),our method can generate all the shortest pivot paths,which is the polynomial of the number of variables.In addition,we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path.Furthermore,this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems. 展开更多
关键词 simplex method linear programming pivot rules reinforcement learning
原文传递
带光滑约束的大地电磁深度强化学习反演
19
作者 曾晨瑞 熊杰 +2 位作者 曹振 张倩玮 袁梦姣 《地质科技通报》 北大核心 2026年第1期302-313,共12页
反演是处理大地电磁测深数据的关键步骤之一,得到了学者的广泛研究。其中基于数据驱动的反演方法主要包括带监督反演和半监督反演,对无监督反演的研究较少。DQN(deep Q-network)是一种经典的深度强化学习算法,作为无监督反演方法最近被... 反演是处理大地电磁测深数据的关键步骤之一,得到了学者的广泛研究。其中基于数据驱动的反演方法主要包括带监督反演和半监督反演,对无监督反演的研究较少。DQN(deep Q-network)是一种经典的深度强化学习算法,作为无监督反演方法最近被用于解决一维大地电磁反演问题,该方法具有不需要训练数据集、对初始模型依赖较小、多次反演能够得到反演结果的概率分布等优点,但存在反演结果不集中的问题。提出了带光滑约束的大地电磁强化学习反演方法(smooth DQN,简称SDQN),该方法基于强化学习框架,将反演问题看成马尔可夫决策问题,并分别定义环境、奖励、智能体等术语;然后将正则化反演的模型约束项引入到奖励中来,从而引导智能体不断调整预测模型的电阻率参数以得到更符合模型约束的结果。理论模型反演结果表明,相较于DQN反演和Occam反演方法,在相同反演次数情况下SDQN方法反演不同噪声水平的观测数据时结果更稳定。西藏扎西康矿集区的大地电磁实测数据反演结果与Occam反演结果基本吻合并与已有的地质解释资料一致。SDQN方法具有反演结果更集中、对观测数据的抗噪能力更强的优点,是解决大地电磁反演问题的新工具。 展开更多
关键词 深度强化学习 大地电磁反演 光滑约束 DQN方法 SDQN方法
在线阅读 下载PDF
DDPG改进人工势场法的无人机三维路径规划
20
作者 柴凯凯 徐海芹 范佳伟 《电光与控制》 北大核心 2026年第2期7-13,共7页
在紧急通信和救援任务中,无人机(UAV)在三维环境中的路径规划至关重要。针对传统的人工势场法在连续空间中的最优路径生成能力受限的问题,设计了一种将深度强化学习深度确定性策略梯度(DDPG)与人工势场(APF)相融合的算法。该算法利用DDP... 在紧急通信和救援任务中,无人机(UAV)在三维环境中的路径规划至关重要。针对传统的人工势场法在连续空间中的最优路径生成能力受限的问题,设计了一种将深度强化学习深度确定性策略梯度(DDPG)与人工势场(APF)相融合的算法。该算法利用DDPG算法的动态调整机制优化APF算法的斥力和引力参数,增强了路径规划的效率;当APF算法陷入局部最小值时,由DDPG算法辅助APF算法跳出局部最小值,确保路径的全局最优性。仿真实验结果表明,与单一使用DDPG算法和APF算法相比,融合算法能在保证成功率的前提下显著缩短路径长度、减小总转向角,提升了路径规划的效率。 展开更多
关键词 路径规划 人工势场法 DDPG 三维环境 深度强化学习
在线阅读 下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部