期刊文献+
共找到45篇文章
< 1 2 3 >
每页显示 20 50 100
A Dynamic Deceptive Defense Framework for Zero-Day Attacks in IIoT:Integrating Stackelberg Game and Multi-Agent Distributed Deep Deterministic Policy Gradient
1
作者 Shigen Shen Xiaojun Ji Yimeng Liu 《Computers, Materials & Continua》 2025年第11期3997-4021,共25页
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th... The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios. 展开更多
关键词 Industrial internet of things zero-day attacks Stackelberg game distributed deep deterministic policy gradient defensive spoofing dynamic defense
在线阅读 下载PDF
Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios
2
作者 Lyuchao Liao Hankun Xiao +3 位作者 Pengqi Xing Zhenhua Gan Youpeng He Jiajun Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期557-576,共20页
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom... Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data. 展开更多
关键词 Autonomous driving traffic roundabouts deep deterministic policy gradient spatial attention mechanisms
在线阅读 下载PDF
Optimizing the Multi-Objective Discrete Particle Swarm Optimization Algorithm by Deep Deterministic Policy Gradient Algorithm
3
作者 Sun Yang-Yang Yao Jun-Ping +2 位作者 Li Xiao-Jun Fan Shou-Xiang Wang Zi-Wei 《Journal on Artificial Intelligence》 2022年第1期27-35,共9页
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ... Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO. 展开更多
关键词 Deep deterministic policy gradient multi-objective discrete particle swarm optimization deep reinforcement learning machine learning
在线阅读 下载PDF
Optimum scheduling of truck-based mobile energy couriers(MEC)using deep deterministic policy gradient
4
作者 Yaze Li Jingxian Wu Yanjun Pan 《Intelligent and Converged Networks》 2025年第3期195-208,共14页
We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,c... We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,converters,capacitor banks,and energy storage devices.The MEC platform can improve the flexibility,resilience,and RES hosting capability of a distribution grid through spatial-temporal energy reallocation based on the stochastic behaviors of RES and loads.The employment of MEC necessitates the development of complex scheduling and control schemes that can adaptively cope with the dynamic natures of both the power grid and the transportation network.The problem is formulated as a non-convex optimization problem to minimize the total generation cost,subject to the various constraints imposed by conventional and renewable energy sources,energy storage,and transportation networks,etc.The problem is solved by combining optimal power flow(OPF)with deep reinforcement learning(DRL)under the framework of deep deterministic policy gradient(DDPG).Simulation results demonstrate that the proposed MEC platform with DDPG can achieve significant cost reduction compared to conventional systems with static energy storage. 展开更多
关键词 transportation network renewable energy integration mobile energy couriers(MECs) markov decision process(MDP) deep deterministic policy gradient(DDPG)
原文传递
Full-model-free Adaptive Graph Deep Deterministic Policy Gradient Model for Multi-terminal Soft Open Point Voltage Control in Distribution Systems 被引量:2
5
作者 Huayi Wu Zhao Xu +1 位作者 Minghao Wang Youwei Jia 《Journal of Modern Power Systems and Clean Energy》 CSCD 2024年第6期1893-1904,共12页
High penetration of renewable energy sources(RESs)induces sharply-fluctuating feeder power,leading to volt-age deviation in active distribution systems.To prevent voltage violations,multi-terminal soft open points(M-s... High penetration of renewable energy sources(RESs)induces sharply-fluctuating feeder power,leading to volt-age deviation in active distribution systems.To prevent voltage violations,multi-terminal soft open points(M-sOPs)have been integrated into the distribution systems to enhance voltage con-trol flexibility.However,the M-SOP voltage control recalculated in real time cannot adapt to the rapid fluctuations of photovol-taic(PV)power,fundamentally limiting the voltage controllabili-ty of M-SOPs.To address this issue,a full-model-free adaptive graph deep deterministic policy gradient(FAG-DDPG)model is proposed for M-SOP voltage control.Specifically,the attention-based adaptive graph convolutional network(AGCN)is lever-aged to extract the complex correlation features of nodal infor-mation to improve the policy learning ability.Then,the AGCN-based surrogate model is trained to replace the power flow cal-culation to achieve model-free control.Furthermore,the deep deterministic policy gradient(DDPG)algorithm allows FAG-DDPG model to learn an optimal control strategy of M-SOP by continuous interactions with the AGCN-based surrogate model.Numerical tests have been performed on modified IEEE 33-node,123-node,and a real 76-node distribution systems,which demonstrate the effectiveness and generalization ability of the proposed FAG-DDPGmodel. 展开更多
关键词 Soft open point graph attention graph convolutional network reinforcement learning voltage control distribution system deep deterministic policy gradient
原文传递
Noise-driven enhancement for exploration:Deep reinforcement learning for UAV autonomous navigation in complex environments
6
作者 Haotian ZHANG Yiyang LI +1 位作者 Lingquan CHENG Jianliang AI 《Chinese Journal of Aeronautics》 2026年第1期454-471,共18页
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin... Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results. 展开更多
关键词 Action space exploration Autonomous navigation Deep reinforcement learning Twin delay deep deterministic policy gradient Unmanned aerial vehicle
原文传递
DDPG-Based Intelligent Computation Offloading and Resource Allocation for LEO Satellite Edge Computing Network 被引量:1
7
作者 Jia Min Wu Jian +2 位作者 Zhang Liang Wang Xinyu Guo Qing 《China Communications》 2025年第3期1-15,共15页
Low earth orbit(LEO)satellites with wide coverage can carry the mobile edge computing(MEC)servers with powerful computing capabilities to form the LEO satellite edge computing system,providing computing services for t... Low earth orbit(LEO)satellites with wide coverage can carry the mobile edge computing(MEC)servers with powerful computing capabilities to form the LEO satellite edge computing system,providing computing services for the global ground users.In this paper,the computation offloading problem and resource allocation problem are formulated as a mixed integer nonlinear program(MINLP)problem.This paper proposes a computation offloading algorithm based on deep deterministic policy gradient(DDPG)to obtain the user offloading decisions and user uplink transmission power.This paper uses the convex optimization algorithm based on Lagrange multiplier method to obtain the optimal MEC server resource allocation scheme.In addition,the expression of suboptimal user local CPU cycles is derived by relaxation method.Simulation results show that the proposed algorithm can achieve excellent convergence effect,and the proposed algorithm significantly reduces the system utility values at considerable time cost compared with other algorithms. 展开更多
关键词 computation offloading deep deterministic policy gradient low earth orbit satellite mobile edge computing resource allocation
在线阅读 下载PDF
Optimization of plunger lift working systems using reinforcement learning for coupled wellbore/reservoir
8
作者 Zhi-Sheng Xing Guo-Qing Han +5 位作者 You-Liang Jia Wei Tian Hang-Fei Gong Wen-Bo Jiang Pei-Dong Mai Xing-Yuan Liang 《Petroleum Science》 2025年第5期2154-2168,共15页
In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With t... In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With the advancement of big data and artificial intelligence,the future of oil and gas field development is trending towards intelligent,unmanned,and automated operations.Currently,the optimization of plunger lift working systems is primarily based on expert experience and manual control,focusing mainly on the success of the plunger lift without adequately considering the impact of different working systems on gas production.Additionally,liquid loading in gas wells is a dynamic process,and the intermittent nature of plunger lift requires accurate modeling;using constant inflow dynamics to describe reservoir flow introduces significant errors.To address these challenges,this study establishes a coupled wellbore-reservoir model for plunger lift wells and validates the computational wellhead pressure results against field measurements.Building on this model,a novel optimization control algorithm based on the deep deterministic policy gradient(DDPG)framework is proposed.The algorithm aims to optimize plunger lift working systems to balance overall reservoir pressure,stabilize gas-water ratios,and maximize gas production.Through simulation experiments in three different production optimization scenarios,the effectiveness of reinforcement learning algorithms(including RL,PPO,DQN,and the proposed DDPG)and traditional optimization algorithms(including GA,PSO,and Bayesian optimization)in enhancing production efficiency is compared.The results demonstrate that the coupled model provides highly accurate calculations and can precisely describe the transient production of wellbore and gas reservoir systems.The proposed DDPG algorithm achieves the highest reward value during training with minimal error,leading to a potential increase in cumulative gas production by up to 5%and cumulative liquid production by 252%.The DDPG algorithm exhibits robustness across different optimization scenarios,showcasing excellent adaptability and generalization capabilities. 展开更多
关键词 Plunger lift Liquid loading Deliquification Reinforcement learning Deep deterministic policy gradient(DDPG) Artificial intelligence
原文传递
Joint offloading decision and resource allocation in vehicular edge computing networks
9
作者 Shumo Wang Xiaoqin Song +3 位作者 Han Xu Tiecheng Song Guowei Zhang Yang Yang 《Digital Communications and Networks》 2025年第1期71-82,共12页
With the rapid development of Intelligent Transportation Systems(ITS),many new applications for Intelligent Connected Vehicles(ICVs)have sprung up.In order to tackle the conflict between delay-sensitive applications a... With the rapid development of Intelligent Transportation Systems(ITS),many new applications for Intelligent Connected Vehicles(ICVs)have sprung up.In order to tackle the conflict between delay-sensitive applications and resource-constrained vehicles,computation offloading paradigm that transfers computation tasks from ICVs to edge computing nodes has received extensive attention.However,the dynamic network conditions caused by the mobility of vehicles and the unbalanced computing load of edge nodes make ITS face challenges.In this paper,we propose a heterogeneous Vehicular Edge Computing(VEC)architecture with Task Vehicles(TaVs),Service Vehicles(SeVs)and Roadside Units(RSUs),and propose a distributed algorithm,namely PG-MRL,which jointly optimizes offloading decision and resource allocation.In the first stage,the offloading decisions of TaVs are obtained through a potential game.In the second stage,a multi-agent Deep Deterministic Policy Gradient(DDPG),one of deep reinforcement learning algorithms,with centralized training and distributed execution is proposed to optimize the real-time transmission power and subchannel selection.The simulation results show that the proposed PG-MRL algorithm has significant improvements over baseline algorithms in terms of system delay. 展开更多
关键词 Computation offloading Resource allocation Vehicular edge computing Potential game Multi-agent deep deterministic policy gradient
在线阅读 下载PDF
Simultaneous Depth and Heading Control for Autonomous Underwater Vehicle Docking Maneuvers Using Deep Reinforcement Learning within a Digital Twin System
10
作者 Yu-Hsien Lin Po-Cheng Chuang Joyce Yi-Tzu Huang 《Computers, Materials & Continua》 2025年第9期4907-4948,共42页
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion... This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks. 展开更多
关键词 Autonomous underwater vehicle docking maneuver digital twin deep reinforcement learning twin delayed deep deterministic policy gradient
在线阅读 下载PDF
Dynamic Task Offloading and Resource Allocation for Air-Ground Integrated Networks Based on MADDPG
11
作者 Jianbin Xue Peipei Mao +2 位作者 Luyao Wang Qingda Yu Changwang Fan 《Journal of Beijing Institute of Technology》 2025年第3期243-267,共25页
With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for ... With the rapid growth of connected devices,traditional edge-cloud systems are under overload pressure.Using mobile edge computing(MEC)to assist unmanned aerial vehicles(UAVs)as low altitude platform stations(LAPS)for communication and computation to build air-ground integrated networks(AGINs)offers a promising solution for seamless network coverage of remote internet of things(IoT)devices in the future.To address the performance demands of future mobile devices(MDs),we proposed an MEC-assisted AGIN system.The goal is to minimize the long-term computational overhead of MDs by jointly optimizing transmission power,flight trajecto-ries,resource allocation,and offloading ratios,while utilizing non-orthogonal multiple access(NOMA)to improve device connectivity of large-scale MDs and spectral efficiency.We first designed an adaptive clustering scheme based on K-Means to cluster MDs and established commu-nication links,improving efficiency and load balancing.Then,considering system dynamics,we introduced a partial computation offloading algorithm based on multi-agent deep deterministic pol-icy gradient(MADDPG),modeling the multi-UAV computation offloading problem as a Markov decision process(MDP).This algorithm optimizes resource allocation through centralized training and distributed execution,reducing computational overhead.Simulation results show that the pro-posed algorithm not only converges stably but also outperforms other benchmark algorithms in han-dling complex scenarios with multiple devices. 展开更多
关键词 air-ground integrated network(AGIN) resource allocation dynamic task offloading multi-agent deep deterministic policy gradient(MADDPG) non-orthogonal multiple access(NOMA)
暂未订购
State-Incomplete Intelligent Dynamic Multipath Routing Algorithm in LEO Satellite Networks
12
作者 Peng Liang Wang Xiaoxiang 《China Communications》 2025年第2期1-11,共11页
The low Earth orbit(LEO)satellite networks have outstanding advantages such as wide coverage area and not being limited by geographic environment,which can provide a broader range of communication services and has bec... The low Earth orbit(LEO)satellite networks have outstanding advantages such as wide coverage area and not being limited by geographic environment,which can provide a broader range of communication services and has become an essential supplement to the terrestrial network.However,the dynamic changes and uneven distribution of satellite network traffic inevitably bring challenges to multipath routing.Even worse,the harsh space environment often leads to incomplete collection of network state data for routing decision-making,which further complicates this challenge.To address this problem,this paper proposes a state-incomplete intelligent dynamic multipath routing algorithm(SIDMRA)to maximize network efficiency even with incomplete state data as input.Specifically,we model the multipath routing problem as a markov decision process(MDP)and then combine the deep deterministic policy gradient(DDPG)and the K shortest paths(KSP)algorithm to solve the optimal multipath routing policy.We use the temporal correlation of the satellite network state to fit the incomplete state data and then use the message passing neuron network(MPNN)for data enhancement.Simulation results show that the proposed algorithm outperforms baseline algorithms regarding average end-to-end delay and packet loss rate and performs stably under certain missing rates of state data. 展开更多
关键词 deep deterministic policy gradient LEO satellite network message passing neuron network multipath routing
在线阅读 下载PDF
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:24
13
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) Deep deterministic policy gradient(DDPG) Deep Reinforcement Learning(DRL) Sampling method UAV
原文传递
Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量:14
14
作者 LI Yue QIU Xiaohui +1 位作者 LIU Xiaodong XIA Qunli 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页
The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh... The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network. 展开更多
关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV)
在线阅读 下载PDF
Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys 被引量:5
15
作者 Tianbo DENG Hao HUANG +2 位作者 Yangwang FANG Jie YAN Haoyu CHENG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第12期309-324,共16页
In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue... In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance. 展开更多
关键词 Deep deterministic policy gradient Infrared decoy Maneuvering target Reinforcement learning Terminal guidance law
原文传递
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
16
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system Multi-agent system Distributed optimization Multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Moving target defense of routing randomization with deep reinforcement learning against eavesdropping attack 被引量:5
17
作者 Xiaoyu Xu Hao Hu +3 位作者 Yuling Liu Jinglei Tan Hongqi Zhang Haotian Song 《Digital Communications and Networks》 SCIE CSCD 2022年第3期373-387,共15页
Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harm... Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks. 展开更多
关键词 Routing randomization Moving target defense Deep reinforcement learning Deep deterministic policy gradient
在线阅读 下载PDF
A UAV collaborative defense scheme driven by DDPG algorithm 被引量:3
18
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
在线阅读 下载PDF
Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG 被引量:2
19
作者 Shasha Tian Yuanxiang Li +4 位作者 Xiao Zhang Lu Zheng Linhui Cheng Wei She Wei Xie 《Digital Communications and Networks》 SCIE CSCD 2024年第4期813-826,共14页
The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex buil... The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex building obstacles,and the aircraft's high dynamics,traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination.Accordingly,in this paper,we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient(TSEB-DDPG)algorithm.We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes.After transformation,we propose the Hierarchical Learning Particle Swarm Optimization(HL-PSO)to obtain the empirical path.Then,to ensure the accuracy of the obtained paths,the empirical path,the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information.The sampling ratio of each buffer is dynamically adapted to the training stages.Moreover,we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning.The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally,and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy.We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm,DDPG algorithm,and TSEB-DDPG algorithm.The results show that the TSEBDDPG algorithm can archive almost the best in terms of accuracy,the average time of actual path planning,and the success rate. 展开更多
关键词 Unmanned aerial vehicle Path planning Deep deterministic policy gradient Three-step experience buffer Particle swarm optimization
在线阅读 下载PDF
基于深度强化学习的IRS辅助NOMA-MEC通信资源分配优化 被引量:2
20
作者 方娟 刘珍珍 +1 位作者 陈思琪 李硕朋 《北京工业大学学报》 CAS CSCD 北大核心 2024年第8期930-938,共9页
为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple ... 为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple access, NOMA)通信的资源分配优化算法,以获得由系统和速率和能源效率(energy efficiency, EE)加权的最大系统收益,从而实现绿色高效通信。通过深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法联合优化传输功率分配和IRS的反射相移矩阵。仿真结果表明,使用DDPG算法处理移动边缘计算(mobile edge computing, MEC)的通信资源分配优于其他几种对比实验算法。 展开更多
关键词 非正交多址(non-orthogonal multiple access NOMA) 智能反射面(intelligent reflecting surface IRS) 深度确定性策略梯度(deep deterministic policy gradient DDPG)算法 移动边缘计算(mobile edge computing MEC) 能源效率(energy efficiency EE) 系统收益
在线阅读 下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部