期刊文献+
共找到61篇文章
< 1 2 4 >
每页显示 20 50 100
Aspects of Replayability and Software Engineering: Towards a Methodology of Developing Games
1
作者 Joseph Krall Tim Menzies 《Journal of Software Engineering and Applications》 2012年第7期459-466,共8页
One application of software engineering is the vast and widely popular video game entertainment industry. Success of a video game product depends on how well the player base receives it. Of research towards understand... One application of software engineering is the vast and widely popular video game entertainment industry. Success of a video game product depends on how well the player base receives it. Of research towards understanding factors of success behind releasing a video game, we are interested in studying a factor known as Replayability. Towards a software engineering oriented game design methodology, we collect player opinions on Replayability via surveys and provide methods to analyze the data. We believe these results can help game designers to more successfully produce entertaining games with longer lasting appeal by utilizing our software engineering techniques. 展开更多
关键词 GAMES ENTERTAINMENT Design Software Engineering replayability
暂未订购
A Deep Reinforcement Learning-Based Partitioning Method for Power System Parallel Restoration
2
作者 Changcheng Li Weimeng Chang +1 位作者 Dahai Zhang Jinghan He 《Energy Engineering》 2026年第1期243-264,共22页
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision... Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training. 展开更多
关键词 Partitioning method parallel restoration deep reinforcement learning experience replay buffer partitioning modularity
在线阅读 下载PDF
Deep reinforcement learning-based adaptive collision avoidance method for UAV in joint operational airspace
3
作者 Yan Shen Xuejun Zhang +1 位作者 Yan Li Weidong Zhang 《Defence Technology(防务技术)》 2026年第2期142-159,共18页
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t... As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios. 展开更多
关键词 Unmanned aerial vehicle Collision avoidance Deep reinforcement learning Joint operational airspace Hierarchical prioritized experience replay
在线阅读 下载PDF
Energy Optimization for Autonomous Mobile Robot Path Planning Based on Deep Reinforcement Learning
4
作者 Longfei Gao Weidong Wang Dieyun Ke 《Computers, Materials & Continua》 2026年第1期984-998,共15页
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ... At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems. 展开更多
关键词 Autonomous mobile robot deep reinforcement learning energy optimization multi-attention mechanism prioritized experience replay dueling deep Q-Network
在线阅读 下载PDF
Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm 被引量:1
5
作者 Zhuoyan Xie Qi Wang +1 位作者 Bin Kong Shang Gao 《Computers, Materials & Continua》 2025年第8期3013-3027,共15页
In the current era of intelligent technologies,comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring,emergency rescue,and agricultural plant protection.Owing ... In the current era of intelligent technologies,comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring,emergency rescue,and agricultural plant protection.Owing to their exceptional flexibility and rapid deployment capabilities,unmanned aerial vehicles(UAVs)have emerged as the ideal platforms for accomplishing these tasks.This study proposes a swarm A^(*)-guided Deep Q-Network(SADQN)algorithm to address the coverage path planning(CPP)problem for UAV swarms in complex environments.Firstly,to overcome the dependency of traditional modeling methods on regular terrain environments,this study proposes an improved cellular decomposition method for map discretization.Simultaneously,a distributed UAV swarm system architecture is adopted,which,through the integration of multi-scale maps,addresses the issues of redundant operations and flight conflicts inmulti-UAV cooperative coverage.Secondly,the heuristic mechanism of the A^(*)algorithmis combinedwith full-coverage path planning,and this approach is incorporated at the initial stage ofDeep Q-Network(DQN)algorithm training to provide effective guidance in action selection,thereby accelerating convergence.Additionally,a prioritized experience replay mechanism is introduced to further enhance the coverage performance of the algorithm.To evaluate the efficacy of the proposed algorithm,simulation experiments were conducted in several irregular environments and compared with several popular algorithms.Simulation results show that the SADQNalgorithmoutperforms othermethods,achieving performance comparable to that of the baseline prior algorithm,with an average coverage efficiency exceeding 2.6 and fewer turning maneuvers.In addition,the algorithm demonstrates excellent generalization ability,enabling it to adapt to different environments. 展开更多
关键词 Coverage path planning unmanned aerial vehicles swarmintelligence DeepQ-Network A^(*)algorithm prioritized experience replay
暂未订购
An SAC-AMBER Algorithm for Flexible Job Shop Scheduling with Material Kit
6
作者 Bo Li Xiaoying Yang +2 位作者 Zhijie Pei Xin Yang Yaqi Wu 《Computers, Materials & Continua》 2025年第8期3649-3672,共24页
It is well known that the kit completeness of parts processed in the previous stage is crucial for the subsequent manufacturing stage.This paper studies the flexible job shop scheduling problem(FJSP)with the objective... It is well known that the kit completeness of parts processed in the previous stage is crucial for the subsequent manufacturing stage.This paper studies the flexible job shop scheduling problem(FJSP)with the objective of material kitting,where a material kit is a collection of components that ensures that a batch of components can be ready at the same time during the product assembly process.In this study,we consider completion time variance and maximumcompletion time as scheduling objectives,continue the weighted summation process formultiple objectives,and design adaptive weighted summation parameters to optimize productivity and reduce the difference in completion time between components in the same kit.The Soft Actor Critic(SAC)algorithm is designed to be combined with the Adaptive Multi-Buffer Experience Replay(AMBER)mechanism to propose the SAC-AMBER algorithm.The AMBER mechanism optimizes the experience sampling and policy updating process and enhances learning efficiency by categorically storing the experience into the standard buffer,the high equipment utilization buffer,and the high productivity buffer.Experimental results show that the SAC-AMBER algorithm can effectively reduce the maximum completion time on multiple datasets,reduce the difference in component completion time in the same kit,and thus optimize the readiness of the part kits,demonstrating relatively good stability and convergence.Compared with traditional heuristics,meta-heuristics,and other deep reinforcement learning methods,the SAC-AMBER algorithm performs better in terms of solution quality and computational efficiency,and through extensive testing on multiple datasets,the algorithm has been confirmed to have good generalization ability,providing an effective solution to the FJSP problem. 展开更多
关键词 Soft actor-critic DRL adaptive multi-buffer experience replay FJSP material kit
在线阅读 下载PDF
Efficient Knowledge-Guided Self-Evolving Intelligent Behavioral Control for Autonomous Vehicles
7
作者 Qiao Peng Kailong Liu +1 位作者 Jingda Wu Amir Khajepour 《IEEE/CAA Journal of Automatica Sinica》 2025年第7期1522-1524,共3页
Dear Editor,This letter addresses the enhancement of autonomous vehicles’(AVs)behavior control systems through the application of reinforcement learning(RL)techniques.It presents a novel approach to efficient knowled... Dear Editor,This letter addresses the enhancement of autonomous vehicles’(AVs)behavior control systems through the application of reinforcement learning(RL)techniques.It presents a novel approach to efficient knowledge-guided self-evolutionary intelligent decision-making by integrating human intervention as prior knowledge into the RL’s exploratory learning process.Specifically,we propose an innovative intervention-based reward shaping mechanism and develop a novel experience replay mechanism to augment the efficiency of leveraging guided knowledge within the framework of off-policy RL.The proposed methodology significantly enhances the performance of RL-based behavior control strategies in complex scenarios for AVs.Illustrative results indicate that,relative to existing state-of-theart methods,our approach yields superior learning efficiency and improved autonomous driving performance. 展开更多
关键词 knowledge guided augment e control systems autonomous vehicles integrating human intervention enhancement experience replay mechanism reinforcement learning rl techniquesit
在线阅读 下载PDF
A Two-Layer UAV Cooperative Computing Offloading Strategy Based on Deep Reinforcement Learning
8
作者 Zhang Jianfei Wang Zhen +1 位作者 Hu Yun Chang Zheng 《China Communications》 2025年第10期251-268,共18页
In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapi... In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy. 展开更多
关键词 cooperative computational offloading deep reinforcement learning mobile edge computing prioritized experience replay two-layer unmanned aerial vehicles
在线阅读 下载PDF
A Real-Time Deep Learning Approach for Electrocardiogram-Based Cardiovascular Disease Prediction with Adaptive Drift Detection and Generative Feature Replay
9
作者 Soumia Zertal Asma Saighi +2 位作者 Sofia Kouah Souham Meshoul Zakaria Laboudi 《Computer Modeling in Engineering & Sciences》 2025年第9期3737-3782,共46页
Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increa... Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increasingly been integratedwithDeep Learning(DL)for real-time prediction of CVDs.However,DL models are prone to performance degradation due to concept drift and to catastrophic forgetting.To address this issue,we propose a realtime CVDs prediction approach,referred to as ADWIN-GFR that combines Convolutional Neural Network(CNN)layers,for spatial feature extraction,with Gated Recurrent Units(GRU),for temporal modeling,alongside adaptive drift detection and mitigation mechanisms.The proposed approach integratesAdaptiveWindowing(ADWIN)for realtime concept drift detection,a fine-tuning strategy based on Generative Features Replay(GFR)to preserve previously acquired knowledge,and a dynamic replay buffer ensuring variance,diversity,and data distribution coverage.Extensive experiments conducted on the MIT-BIH arrhythmia dataset demonstrate that ADWIN-GFR outperforms standard fine-tuning techniques,achieving an average post-drift accuracy of 95.4%,amacro F1-score of 93.9%,and a remarkably low forgetting score of 0.9%.It also exhibits an average drift detection delay of 12 steps and achieves an adaptation gain of 17.2%.These findings underscore the potential of ADWIN-GFR for deployment in real-world cardiac monitoring systems,including wearable ECG devices and hospital-based patient monitoring platforms. 展开更多
关键词 Real-time cardiovascular disease prediction concept drift detection catastrophic forgetting fine-tuning electrocardiogram convolutional neural networks gated recurrent units adaptive windowing generative feature replay
在线阅读 下载PDF
Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge 被引量:28
10
作者 Lan Jiang Hongyun Huang Zuohua Ding 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第4期1179-1189,共11页
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ... Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward. 展开更多
关键词 Deep Q-learning(DQL) experience replay(ER) heuristic knowledge(HK) path planning
在线阅读 下载PDF
Resilience Against Replay Attacks:A Distributed Model Predictive Control Scheme for Networked Multi-Agent Systems 被引量:5
11
作者 Giuseppe Franzè Francesco Tedesco Domenico Famularo 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第3期628-640,共13页
In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use ... In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use of predictive arguments with a twofold aim:1)Promptly detect malicious agent behaviors affecting normal system operations;2)Apply specific control actions,based on predictive ideas,for mitigating as much as possible undesirable domino effects resulting from adversary operations.Specifically,the multi-agent system is topologically described by a leader-follower digraph characterized by a unique leader and set-theoretic receding horizon control ideas are exploited to develop a distributed algorithm capable to instantaneously recognize the attacked agent.Finally,numerical simulations are carried out to show benefits and effectiveness of the proposed approach. 展开更多
关键词 Distributed model predictive control leader-follower networks multi-agent systems replay attacks resilient control
在线阅读 下载PDF
Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments 被引量:4
12
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Neretin EVGENY Jinliang LI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第5期377-391,共15页
As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and ... As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm. 展开更多
关键词 Artificial intelligence Autonomous navigation control Deep reinforcement learning Hindsight experience replay UAV
原文传递
Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments 被引量:4
13
作者 Fei WANG Xiaoping ZHU +1 位作者 Zhou ZHOU Yang TANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第3期237-257,共21页
In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenge... In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenges for UAV autonomous navigation and collision avoidance.In this paper,an improved deep-reinforcement-learning algorithm,Deep Q-Network with a Faster R-CNN model and a Data Deposit Mechanism(FRDDM-DQN),is proposed.A Faster R-CNN model(FR)is introduced and optimized to obtain the ability to extract obstacle information from images,and a new replay memory Data Deposit Mechanism(DDM)is designed to train an agent with a better performance.During training,a two-part training approach is used to reduce the time spent on training as well as retraining when the scenario changes.In order to verify the performance of the proposed method,a series of experiments,including training experiments,test experiments,and typical episodes experiments,is conducted in a 3D simulation environment.Experimental results show that the agent trained by the proposed FRDDM-DQN has the ability to navigate autonomously and avoid collisions,and performs better compared to the FRDQN,FR-DDQN,FR-Dueling DQN,YOLO-based YDDM-DQN,and original FR outputbased FR-ODQN. 展开更多
关键词 Faster R-CNN model Replay memory Data Deposit Mechanism(DDM) Two-part training approach Image-based Autonomous Navigation and Collision Avoidance(ANCA) Unmanned Aerial Vehicle(UAV)
原文传递
Distributed Platooning Control of Automated Vehicles Subject to Replay Attacks Based on Proportional Integral Observers 被引量:2
14
作者 Meiling Xie Derui Ding +3 位作者 Xiaohua Ge Qing-Long Han Hongli Dong Yan Song 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第9期1954-1966,共13页
Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issu... Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issue of automated vehicles affected by replay attacks.A proportional-integral-observer(PIO)with predetermined forgetting parameters is first constructed to acquire the dynamical information of vehicles.Then,a time-varying parameter and two positive scalars are employed to describe the temporal behavior of replay attacks.In light of such a scheme and the common properties of Laplace matrices,the closed-loop system with PIO-based controllers is transformed into a switched and time-delayed one.Furthermore,some sufficient conditions are derived to achieve the desired platooning performance by the view of the Lyapunov stability theory.The controller gains are analytically determined by resorting to the solution of certain matrix inequalities only dependent on maximum and minimum eigenvalues of communication topologies.Finally,a simulation example is provided to illustrate the effectiveness of the proposed control strategy. 展开更多
关键词 Automated vehicles platooning control proportional-integral-observers(PIOs) replay attacks TIME-DELAYS
在线阅读 下载PDF
Barrier-Certified Learning-Enabled Safe Control Design for Systems Operating in Uncertain Environments 被引量:2
15
作者 Zahra Marvi Bahare Kiumarsi 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第3期437-449,共13页
This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constrai... This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constraints are imposed by not only the ego system’s own physical limitations but also other systems operating nearby.Since the model of the external agent is required to impose control barrier functions(CBFs)as safety constraints,a safety-aware loss function is defined and minimized to learn the uncertain and unknown behavior of external agents.More specifically,the loss function is defined based on barrier function error,instead of the system model error,and is minimized for both current samples as well as past samples stored in the memory to assure a fast and generalizable learning algorithm for approximating the safe set.The proposed model learning and CBF are then integrated together to form a learning-enabled zeroing CBF(L-ZCBF),which employs the approximated trajectory information of the external agents provided by the learned model but shrinks the safety boundary in case of an imminent safety violation using instantaneous sensory observations.It is shown that the proposed L-ZCBF assures the safety guarantees during learning and even in the face of inaccurate or simplified approximation of external agents,which is crucial in safety-critical applications in highly interactive environments.The efficacy of the proposed method is examined in a simulation of safe maneuver control of a vehicle in an urban area. 展开更多
关键词 Control barrier functions(CBFs) experience replay learning safety-critical systems UNCERTAINTY
在线阅读 下载PDF
Sampled-data control through model-free reinforcement learning with effective experience replay 被引量:4
16
作者 Bo Xiao H.K.Lam +4 位作者 Xiaojie Su Ziwei Wang Frank P.-W.Lo Shihong Chen Eric Yeatman 《Journal of Automation and Intelligence》 2023年第1期20-30,共11页
Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can lear... Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples. 展开更多
关键词 Reinforcement learning Neural networks Sampled-data control MODEL-FREE Effective experience replay
在线阅读 下载PDF
Replay Attack and Defense of Electric Vehicle Charging on GB/T 27930-2015 Communication Protocol 被引量:2
17
作者 Yafei Li Yong Wang +1 位作者 Min Wu Haiming Li 《Journal of Computer and Communications》 2019年第12期20-30,共11页
The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast ... The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast communication and plaintext to transmit data, the data frame does not contain the source address and the destination address, making the Electric Vehicle (EV) vulnerable to replay attack in the charging process. In order to verify the security problems of the protocol, this paper uses 27,655 message data in the complete charging process provided by Shanghai Thaisen electric company, and analyzes these actual data frames one by one with the program written by C++. In order to enhance the security of the protocol, Rivest-Shamir-Adleman (RSA) digital signature and adding random numbers are proposed to resist replay attack. Under the experimental environment of Eclipse, the normal charging of electric vehicles, RSA digital signature and random number defense are simulated. Experimental results show that RSA digital signature cannot resist replay attack, and adding random numbers can effectively enhance the ability of EV to resist replay attack during charging. 展开更多
关键词 EV CHARGING GB/T 27930-2015 REPLAY ATTACK RSA Digital SIGNATURE Adding Random NUMBERS
在线阅读 下载PDF
A Data-Based Feedback Relearning Algorithm for Uncertain Nonlinear Systems 被引量:1
18
作者 Chaoxu Mu Yong Zhang +2 位作者 Guangbin Cai Ruijun Liu Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第5期1288-1303,共16页
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni... In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison. 展开更多
关键词 Data episodes experience replay neural networks reinforcement learning(RL) uncertain systems
在线阅读 下载PDF
A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory 被引量:1
19
作者 Bao Xi Rui Wang +2 位作者 Ying-Hao Cai Tao Lu Shuo Wang 《International Journal of Automation and computing》 EI CSCD 2021年第4期619-631,共13页
Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, w... Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method. 展开更多
关键词 Reinforcement learning(RL) actor-critic experience replay training efficiency manipulation skill learning
原文传递
Squeezing More Past Knowledge for Online Class-Incremental Continual Learning 被引量:1
20
作者 Da Yu Mingyi Zhang +4 位作者 Mantian Li Fusheng Zha Junge Zhang Lining Sun Kaiqi Huang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第3期722-736,共15页
Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,kno... Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios. 展开更多
关键词 Catastrophic forgetting class-incremental learning continual learning(CL) experience replay
在线阅读 下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部