期刊文献+
共找到684篇文章
< 1 2 35 >
每页显示 20 50 100
A Regional Distribution Network Coordinated Optimization Strategy for Electric Vehicle Clusters Based on Parametric Deep Reinforcement Learning
1
作者 Lei Su Wanli Feng +4 位作者 Cao Kan Mingjiang Wei Jihai Wang Pan Yu Lingxiao Yang 《Energy Engineering》 2026年第3期195-214,共20页
To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy s... To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy storage(ES)devices),and the increased grid load fluctuations and safety risks due to uncoordinated electric vehicles(EVs)charging,this paper proposes a novel dual-scale hierarchical collaborative optimization strategy.This strategy decouples system-level economic dispatch from distributed EV agent control,effectively solving the resource coordination conflicts arising from the high computational complexity,poor scalability of existing centralized optimization,or the reliance on local information decision-making in fully decentralized frameworks.At the lower level,an EV charging and discharging model with a hybrid discrete-continuous action space is established,and optimized using an improved Parameterized Deep Q-Network(PDQN)algorithm,which directly handles mode selection and power regulation while embedding physical constraints to ensure safety.At the upper level,microgrid(MG)operators adopt a dynamic pricing strategy optimized through Deep Reinforcement Learning(DRL)to maximize economic benefits and achieve peak-valley shaving.Simulation results show that the proposed strategy outperforms traditional methods,reducing the total operating cost of the MG by 21.6%,decreasing the peak-to-valley load difference by 33.7%,reducing the number of voltage limit violations by 88.9%,and lowering the average electricity cost for EV users by 15.2%.This method brings a win-win result for operators and users,providing a reliable and efficient scheduling solution for distribution networks with high renewable energy penetration rates. 展开更多
关键词 Power system regional distributed energy electric vehicle deep reinforcement learning collaborative optimization
在线阅读 下载PDF
AquaTree:Deep Reinforcement Learning-Driven Monte Carlo Tree Search for Underwater Image Enhancement
2
作者 Chao Li Jianing Wang +1 位作者 Caichang Ding Zhiwei Ye 《Computers, Materials & Continua》 2026年第3期1444-1464,共21页
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth... Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics. 展开更多
关键词 Underwater image enhancement(UIE) Monte Carlo tree search(MCTS) deep reinforcement learning(drl) Markov decision process(MDP)
在线阅读 下载PDF
Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks
3
作者 Jian-Dong Yao Wen-Bin Hao +3 位作者 Zhi-Gao Meng Bo Xie Jian-Hua Chen Jia-Qi Wei 《Journal of Electronic Science and Technology》 2025年第1期35-59,共25页
This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards grea... This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards greater decentralization and renewable integration,traditional optimization methods struggle to address the inherent complexities and uncertainties.Our proposed MARL framework enables adaptive,decentralized decision-making for both the distribution system operator and individual VPPs,optimizing economic efficiency while maintaining grid stability.We formulate the problem as a Markov decision process and develop a custom MARL algorithm that leverages actor-critic architectures and experience replay.Extensive simulations across diverse scenarios demonstrate that our approach consistently outperforms baseline methods,including Stackelberg game models and model predictive control,achieving an 18.73%reduction in costs and a 22.46%increase in VPP profits.The MARL framework shows particular strength in scenarios with high renewable energy penetration,where it improves system performance by 11.95%compared with traditional methods.Furthermore,our approach demonstrates superior adaptability to unexpected events and mis-predictions,highlighting its potential for real-world implementation. 展开更多
关键词 distributed energy management Dynamic pricing Multi-agent reinforcement learning Renewable energy integration Virtual power plants
在线阅读 下载PDF
Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints
4
作者 Xueyan Sun Weiming Shen +3 位作者 Jiaxin Fan Birgit Vogel-Heuser Fandi Bi Chunjiang Zhang 《Engineering》 2025年第3期278-291,共14页
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr... This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality. 展开更多
关键词 Multi-objective Markov decision process Multi-agent deep reinforcement learning Proximal policy optimization distributed hybrid flow-shop scheduling Blocking constraints
在线阅读 下载PDF
配电网中基于混合DRL的任务卸载与多资源协同调度优化方法
5
作者 周雅 王乾 方如举 《电力系统保护与控制》 北大核心 2026年第4期165-174,共10页
针对配电网在数字化、分布式和智能化演进过程中面临的“计算-通信-能源”多资源协同调度与任务卸载导致的时延-能耗联合最优化问题,构建了涵盖本地终端、边缘服务器与云端的数据驱动三层协同计算模型。该模型以加权时延-能耗-公平指标... 针对配电网在数字化、分布式和智能化演进过程中面临的“计算-通信-能源”多资源协同调度与任务卸载导致的时延-能耗联合最优化问题,构建了涵盖本地终端、边缘服务器与云端的数据驱动三层协同计算模型。该模型以加权时延-能耗-公平指标函数为优化目标,综合刻画无线信道条件、传输速率和CPU频率等关键因素,从而量化多资源协同对系统性能的影响。为应对离散卸载决策与连续带宽/计算/能量分配构成的混合动作空间挑战,提出混合深度强化学习(hybrid deep reinforcement learning, HDRL)框架。上层采用双重深度Q网络(double deep Q-network, DDQN)进行卸载动作选择,下层利用深度确定性策略梯度(deep deterministic policy gradient, DDPG)实现连续资源调度,并设计改进优先级经验回放机制(improved prioritized experience replay, IPER)提高样本利用率与收敛速度。仿真结果表明,与纯本地计算、纯边缘计算、随机卸载、遗传算法(genetic algorithms, GA)和不含IPER的DDQN+DDPG方法相比,所提HDRL算法在多场景下显著降低了系统平均时延与总能耗,同时,能在用户规模扩大时依旧能维持高公平性,表现出最佳的扩展鲁棒性,提升了任务完成率与算法稳健性,为配电网多资源协同优化提供了可行、高效的解决方案。 展开更多
关键词 边缘计算 任务卸载 资源分配 配电网 深度强化学习
在线阅读 下载PDF
Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning 被引量:3
6
作者 Xiaoxiao Zhao Peng Yi Li Li 《Control Theory and Technology》 EI CSCD 2020年第4期362-378,共17页
This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimiz... This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimize the global return as the sum of local return,the agents exchange information with their neighbors through a communication network.The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint;then,a distributed alternating directions method of multipliers(ADMM)algorithm is proposed to solve it.Furthermore,an inexact step for ADMM is used to achieve efficient computation at each iteration.The convergence of the proposed algorithm is established.yipeng@tongji.edu.cn;LilLi received the B.Sc.and M.Se.degrees from Shengyang Agri-culture University,China in 1996 and 1999.respectivly.and the Ph.D.degree from Shenyang Institute of Automation,Chinese Academy of Science,in 2003.She joined Tongji Universitry,Shanghai,China,in 2003,and is now a professor at the Depart-ment of Control Science and Engineering.Her research inter-ests are in data-driven modeling and opimization,computaional intelligence. 展开更多
关键词 Multi-agent system reinforcement learning distributed optimization Policy evaluation
原文传递
An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network 被引量:3
7
作者 Zhe Chen Ning Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第11期2081-2093,共13页
This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objecti... This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation.Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning(RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework. 展开更多
关键词 distributed optimization MULTI-AGENT optimal control reinforcement learning(RL)
在线阅读 下载PDF
An Iterated Greedy Algorithm with Memory and Learning Mechanisms for the Distributed Permutation Flow Shop Scheduling Problem
8
作者 Binhui Wang Hongfeng Wang 《Computers, Materials & Continua》 SCIE EI 2025年第1期371-388,共18页
The distributed permutation flow shop scheduling problem(DPFSP)has received increasing attention in recent years.The iterated greedy algorithm(IGA)serves as a powerful optimizer for addressing such a problem because o... The distributed permutation flow shop scheduling problem(DPFSP)has received increasing attention in recent years.The iterated greedy algorithm(IGA)serves as a powerful optimizer for addressing such a problem because of its straightforward,single-solution evolution framework.However,a potential draw-back of IGA is the lack of utilization of historical information,which could lead to an imbalance between exploration and exploitation,especially in large-scale DPFSPs.As a consequence,this paper develops an IGA with memory and learning mechanisms(MLIGA)to efficiently solve the DPFSP targeted at the mini-malmakespan.InMLIGA,we incorporate a memory mechanism to make a more informed selection of the initial solution at each stage of the search,by extending,reconstructing,and reinforcing the information from previous solutions.In addition,we design a twolayer cooperative reinforcement learning approach to intelligently determine the key parameters of IGA and the operations of the memory mechanism.Meanwhile,to ensure that the experience generated by each perturbation operator is fully learned and to reduce the prior parameters of MLIGA,a probability curve-based acceptance criterion is proposed by combining a cube root function with custom rules.At last,a discrete adaptive learning rate is employed to enhance the stability of the memory and learningmechanisms.Complete ablation experiments are utilized to verify the effectiveness of the memory mechanism,and the results show that this mechanism is capable of improving the performance of IGA to a large extent.Furthermore,through comparative experiments involving MLIGA and five state-of-the-art algorithms on 720 benchmarks,we have discovered that MLI-GA demonstrates significant potential for solving large-scale DPFSPs.This indicates that MLIGA is well-suited for real-world distributed flow shop scheduling. 展开更多
关键词 distributed permutation flow shop scheduling MAKESPAN iterated greedy algorithm memory mechanism cooperative reinforcement learning
在线阅读 下载PDF
Novel multi-agent action masked deep reinforcement learning for general industrial assembly lines balancing problems
9
作者 Ali M.Ali Luca Tirel Hashim A.Hashim 《Journal of Automation and Intelligence》 2025年第4期299-311,共13页
Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to... Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to such challenges can be obtained through Integer Programming(IP),the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios.Heuristic methods,such as Genetic Algorithms,can also be applied,but they frequently produce suboptimal solutions in extensive cases.This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process(MDP),without imposing assumptions on the type of assembly line a notable distinction from most existing models.The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning(DRL)agents to optimize task and resource scheduling.To enhance the efficiency of agent training,the paper proposes two innovative tools.The first is an action-masking technique,which ensures the agent selects only feasible actions,thereby reducing training time.The second is a multi-agent approach,where each workstation is managed by an individual agent,as a result,the state and action spaces were reduced.A centralized training framework with decentralized execution is adopted,offering a scalable learning architecture for optimizing industrial assembly lines.This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action.The effectiveness of the proposed scheme is validated through numerical simulations,demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach. 展开更多
关键词 Artificial intelligence in industrial engineering Autonomous decision making distributed multi-agent learning reinforcement learning
在线阅读 下载PDF
DRL-based federated self-supervised learning for task offloading and resource allocation in ISAC-enabled vehicle edge computing
10
作者 Xueying Gu Qiong Wu +3 位作者 Pingyi Fan Nan Cheng Wen Chen Khaled B.Letaief 《Digital Communications and Networks》 2025年第5期1614-1627,共14页
Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably incr... Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL. 展开更多
关键词 Integrated sensing and communications(ISAC) Federated self-supervised learning Resource allocation and offloading Deep reinforcement learning(drl) Vehicle edge computing(VEC)
在线阅读 下载PDF
Exploring crash induction strategies in within-visual-range air combat based on distributional reinforcement learning
11
作者 Zetian HU Xuefeng LIANG +2 位作者 Jun ZHANG Xiaochuan YOU Chengcheng MA 《Chinese Journal of Aeronautics》 2025年第9期350-364,共15页
Within-Visual-Range(WVR)air combat is a highly dynamic and uncertain domain where effective strategies require intelligent and adaptive decision-making.Traditional approaches,including rule-based methods and conventio... Within-Visual-Range(WVR)air combat is a highly dynamic and uncertain domain where effective strategies require intelligent and adaptive decision-making.Traditional approaches,including rule-based methods and conventional Reinforcement Learning(RL)algorithms,often focus on maximizing engagement outcomes through direct combat superiority.However,these methods overlook alternative tactics,such as inducing adversaries to crash,which can achieve decisive victories with lower risk and cost.This study proposes Alpha Crash,a novel distributional-rein forcement-learning-based agent specifically designed to defeat opponents by leveraging crash induction strategies.The approach integrates an improved QR-DQN framework to address uncertainties and adversarial tactics,incorporating advanced pilot experience into its reward functions.Extensive simulations reveal Alpha Crash's robust performance,achieving a 91.2%win rate across diverse scenarios by effectively guiding opponents into critical errors.Visualization and altitude analyses illustrate the agent's three-stage crash induction strategies that exploit adversaries'vulnerabilities.These findings underscore Alpha Crash's potential to enhance autonomous decision-making and strategic innovation in real-world air combat applications. 展开更多
关键词 Unmanned combat aerial vehicle Decision-making Distributional reinforcement learning Within-visual-range air combat Crash induction strategy
原文传递
Deep Synchronization Control of Grid-Forming Converters:A Reinforcement Learning Approach
12
作者 Zhuorui Wu Meng Zhang +2 位作者 Bo Fan Yang Shi Xiaohong Guan 《IEEE/CAA Journal of Automatica Sinica》 2025年第1期273-275,共3页
Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters base... Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters based on the stable deep dynamics model. To enhance the performance of the controller, the dynamics model is optimized within the deep reinforcement learning(DRL) framework. Simulation results verify that the proposed method can reduce frequency deviation and improve active power responses. 展开更多
关键词 reduce frequency deviation enhance performance stable deep dynamics model improve active power responses deep reinforcement learning drl dynamics model deep synchronization control dsc deep synchronization control
在线阅读 下载PDF
Deep reinforcement learning based communication resource allocation driven by radar point cloud for urban air mobility
13
作者 Leyan CHEN Kai LIU +1 位作者 Qiang GAO Zhibo ZHANG 《Chinese Journal of Aeronautics》 2025年第12期404-414,共11页
In the future smart cities,unmanned aerial vehicles(UAVs)or electric vertical take-off and landing aircraft(e VTOL)are widely employed for urban air mobility(UAM).Considering such real-world scenarios,a deep reinforce... In the future smart cities,unmanned aerial vehicles(UAVs)or electric vertical take-off and landing aircraft(e VTOL)are widely employed for urban air mobility(UAM).Considering such real-world scenarios,a deep reinforcement learning based communication resource allocation method is proposed for UAVs to provide communication services for e VTOL swarms to ensure their reliable communication and safe operation.To save energy consumption,UAVs can ride on a moving interaction station(MIS),such as an urban bus.By using UAV trajectory control and communication power allocation,a joint fair optimization problem is formulated to maximize the channel capacity while optimizing radar sensing performance.To address the optimization problem,a Point Cloud based deep Q-network(PCDQN)algorithm is proposed.It contains a point neural network that can determine the action space of the UAV directly originating from the threedimensional(3D)radar point clouds,and a deep reinforcement learning based decision model for deciding the action from action spaces.Simulation results demonstrate that the proposed method exhibits competitive performance compared to the benchmarks. 展开更多
关键词 Urban air mobility(UAM) Unmanned aerial vehicles(UAVs) Electric vertical takeoff and landing aircraft(eVTOL) Radar point cloud Trajectory control Resource allocation Deep reinforcement learning(drl)
原文传递
Autonomous Vehicle Platoons In Urban Road Networks:A Joint Distributed Reinforcement Learning and Model Predictive Control Approach
14
作者 Luigi D’Alfonso Francesco Giannini +3 位作者 Giuseppe Franzè Giuseppe Fedele Francesco Pupo Giancarlo Fortino 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期141-156,共16页
In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory... In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors. 展开更多
关键词 distributed model predictive control distributed reinforcement learning routing decisions urban road networks
在线阅读 下载PDF
基于DRL的大规模定制装配车间调度研究
15
作者 屈新怀 张慧慧 +1 位作者 丁必荣 孟冠军 《合肥工业大学学报(自然科学版)》 北大核心 2025年第7期878-883,共6页
针对大规模定制装配车间中订单的随机性和偶然性问题,文章提出一种基于深度强化学习(deep reinforcement learning,DRL)的大规模定制装配车间作业调度优化方法。建立以最小化产品组件更换次数和最小化订单提前/拖期惩罚为目标的大规模... 针对大规模定制装配车间中订单的随机性和偶然性问题,文章提出一种基于深度强化学习(deep reinforcement learning,DRL)的大规模定制装配车间作业调度优化方法。建立以最小化产品组件更换次数和最小化订单提前/拖期惩罚为目标的大规模定制装配车间作业调度优化模型,基于调度模型建立马尔科夫决策过程,合理定义状态、动作和奖励函数;将调度模型优化问题与DRL方法相结合,并采用改进的D3QN算法进行模型求解;最后进行仿真实验验证。结果表明,文章所提方法能有效减少产品组件更换次数和降低订单提前/拖期惩罚。 展开更多
关键词 大规模定制 装配车间 深度强化学习(drl) 车间作业调度 调度优化模型
在线阅读 下载PDF
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:24
16
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) Deep Deterministic Policy Gradient(DDPG) Deep reinforcement learning(drl) Sampling method UAV
原文传递
Active control of flow past an elliptic cylinder using an artificial neural network trained by deep reinforcement learning 被引量:2
17
作者 Bofu WANG Qiang WANG +1 位作者 Quan ZHOU Yulu LIU 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2022年第12期1921-1934,共14页
The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°... The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°,15°,30°,and 45°are taken into consideration for a fixed Reynolds number Re=100.The mass flow rates of two synthetic jets imposed on different positions of the cylinderθ1andθ2are trained to control the flow.The optimal jet placement that achieves the highest drag reduction is determined for each case.For a low axis ratio ellipse,i.e.,Γ=1.2,the controlled results atα=0°are similar to those for a circular cylinder with control jets applied atθ1=90°andθ2=270°.It is found that either applying the jets asymmetrically or increasing the angle of attack can achieve a higher drag reduction rate,which,however,is accompanied by increased fluctuation.The control jets elongate the vortex shedding,and reduce the pressure drop.Meanwhile,the flow topology is modified at a high angle of attack.For an ellipse with a relatively higher axis ratio,i.e.,Γ1.6,the drag reduction is achieved for all the angles of attack studied.The larger the angle of attack is,the higher the drag reduction ratio is.The increased fluctuation in the drag coefficient under control is encountered,regardless of the position of the control jets.The control jets modify the flow topology by inducing an external vortex near the wall,causing the drag reduction.The results suggest that the DRL can learn an active control strategy for the present configuration. 展开更多
关键词 drag reduction deep reinforcement learning(drl) elliptical cylinder active control
在线阅读 下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
18
作者 Zhao Jiaming Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) DEEP reinforcement learning(drl) MARKOV decision process(MDP)
在线阅读 下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior 被引量:2
19
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(drl) sparse returns
在线阅读 下载PDF
Hierarchical reinforcement learning guidance with threat avoidance 被引量:1
20
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(drl) threat avoidance hierarchical reinforcement learning
在线阅读 下载PDF
上一页 1 2 35 下一页 到第
使用帮助 返回顶部