期刊文献+
共找到8,982篇文章
< 1 2 250 >
每页显示 20 50 100
A Multi-Objective Deep Reinforcement Learning Algorithm for Computation Offloading in Internet of Vehicles
1
作者 Junjun Ren Guoqiang Chen +1 位作者 Zheng-Yi Chai Dong Yuan 《Computers, Materials & Continua》 2026年第1期2111-2136,共26页
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain... Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively. 展开更多
关键词 Deep reinforcement learning internet of vehicles multi-objective optimization cloud-edge computing computation offloading service caching
在线阅读 下载PDF
Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints
2
作者 Xueyan Sun Weiming Shen +3 位作者 Jiaxin Fan Birgit Vogel-Heuser Fandi Bi Chunjiang Zhang 《Engineering》 2025年第3期278-291,共14页
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr... This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality. 展开更多
关键词 multi-objective Markov decision process Multi-agent deep reinforcement learning Proximal policy optimization Distributed hybrid flow-shop scheduling Blocking constraints
在线阅读 下载PDF
Multi-Objective Parallel Human-machine Steering Coordination Control Strategy of Intelligent Vehicles Path Tracking Based on Deep Reinforcement Learning
3
作者 Hongbo Wang Lizhao Feng +2 位作者 Shaohua Li Wuwei Chen Juntao Zhou 《Chinese Journal of Mechanical Engineering》 2025年第3期393-411,共19页
In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordinat... In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordination control strategy under a single objective and simple conditions is difficult to adapt to the multi-dimensional state variables’input.In this paper,we propose a deep reinforcement learning algorithm-based multi-objective parallel human-machine steering coordination strategy for path tracking considering driver misoperation and external disturbance.Firstly,the driver steering mathematical model is constructed based on the driver preview characteristics and steering delay response,and the driver characteristic parameters are fitted after collecting the actual driver driving data.Secondly,considering that the vehicle is susceptible to the influence of external disturbances during the driving process,the Tube MPC(Tube Model Predictive Control)based path tracking steering controller is designed based on the vehicle system dynamics error model.After verifying that the driver steering model meets the driver steering operation characteristics,DQN(Deep Q-network),DDPG(Deep Deterministic Policy Gradient)and TD3(Twin Delayed Deep Deterministic Policy Gradient)deep reinforcement learning algorithms are utilized to design a multi-objective parallel steering coordination strategy which satisfies the multi-dimensional state variables’input of the vehicle.Finally,the tracking accuracy,lateral safety,human-machine conflict and driver steering load evaluation index are designed in different driver operation states and different road environments,and the performance of the parallel steering coordination control strategies with different deep reinforcement learning algorithms and fuzzy algorithms are compared by simulations and hardware in the loop experiments.The results show that the parallel steering collaborative strategy based on a deep reinforcement learning algorithm can more effectively assist the driver in tracking the target path under lateral wind interference and driver misoperation,and the TD3-based coordination control strategy has better overall performance. 展开更多
关键词 Path tracking Human-machine co-driving Parallel steering coordination Deep reinforcement learning
在线阅读 下载PDF
An Improved Reinforcement Learning-Based 6G UAV Communication for Smart Cities
4
作者 Vi Hoai Nam Chu Thi Minh Hue Dang Van Anh 《Computers, Materials & Continua》 2026年第1期2030-2044,共15页
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top... Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments. 展开更多
关键词 UAV FANET smart cities reinforcement learning Q-learning
在线阅读 下载PDF
A Deep Reinforcement Learning-Based Partitioning Method for Power System Parallel Restoration
5
作者 Changcheng Li Weimeng Chang +1 位作者 Dahai Zhang Jinghan He 《Energy Engineering》 2026年第1期243-264,共22页
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision... Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training. 展开更多
关键词 Partitioning method parallel restoration deep reinforcement learning experience replay buffer partitioning modularity
在线阅读 下载PDF
Evaluation of Reinforcement Learning-Based Adaptive Modulation in Shallow Sea Acoustic Communication
6
作者 Yifan Qiu Xiaoyu Yang +1 位作者 Feng Tong Dongsheng Chen 《哈尔滨工程大学学报(英文版)》 2026年第1期292-299,共8页
While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance re... While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance remains underexplored in field investigations.To evaluate the practical applicability of this emerging technique in adverse shallow sea channels,a field experiment was conducted using three communication modes:orthogonal frequency division multiplexing(OFDM),M-ary frequency-shift keying(MFSK),and direct sequence spread spectrum(DSSS)for reinforcement learning-driven adaptive modulation.Specifically,a Q-learning method is used to select the optimal modulation mode according to the channel quality quantified by signal-to-noise ratio,multipath spread length,and Doppler frequency offset.Experimental results demonstrate that the reinforcement learning-based adaptive modulation scheme outperformed fixed threshold detection in terms of total throughput and average bit error rate,surpassing conventional adaptive modulation strategies. 展开更多
关键词 Adaptive modulation Shallow sea underwater acoustic modulation reinforcement learning
在线阅读 下载PDF
Energy Optimization for Autonomous Mobile Robot Path Planning Based on Deep Reinforcement Learning
7
作者 Longfei Gao Weidong Wang Dieyun Ke 《Computers, Materials & Continua》 2026年第1期984-998,共15页
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ... At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems. 展开更多
关键词 Autonomous mobile robot deep reinforcement learning energy optimization multi-attention mechanism prioritized experience replay dueling deep Q-Network
在线阅读 下载PDF
Multi-Objective Deep Reinforcement Learning Based Time-Frequency Resource Allocation for Multi-Beam Satellite Communications 被引量:6
8
作者 Yuanzhi He Biao Sheng +2 位作者 Hao Yin Di Yan Yingchao Zhang 《China Communications》 SCIE CSCD 2022年第1期77-91,共15页
Resource allocation is an important problem influencing the service quality of multi-beam satellite communications.In multi-beam satellite communications, the available frequency bandwidth is limited, users requiremen... Resource allocation is an important problem influencing the service quality of multi-beam satellite communications.In multi-beam satellite communications, the available frequency bandwidth is limited, users requirements vary rapidly, high service quality and joint allocation of multi-dimensional resources such as time and frequency are required. It is a difficult problem needs to be researched urgently for multi-beam satellite communications, how to obtain a higher comprehensive utilization rate of multidimensional resources, maximize the number of users and system throughput, and meet the demand of rapid allocation adapting dynamic changed the number of users under the condition of limited resources, with using an efficient and fast resource allocation algorithm.In order to solve the multi-dimensional resource allocation problem of multi-beam satellite communications, this paper establishes a multi-objective optimization model based on the maximum the number of users and system throughput joint optimization goal, and proposes a multi-objective deep reinforcement learning based time-frequency two-dimensional resource allocation(MODRL-TF) algorithm to adapt dynamic changed the number of users and the timeliness requirements. Simulation results show that the proposed algorithm could provide higher comprehensive utilization rate of multi-dimensional resources,and could achieve multi-objective joint optimization,and could obtain better timeliness than traditional heuristic algorithms, such as genetic algorithm(GA)and ant colony optimization algorithm(ACO). 展开更多
关键词 multi-beam satellite communications time-frequency resource allocation multi-objective optimization deep reinforcement learning
在线阅读 下载PDF
Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning 被引量:5
9
作者 苗镇华 黄文焘 +1 位作者 张依恋 范勤勤 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期377-387,共11页
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi... The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems. 展开更多
关键词 multi-robot task allocation multi-robot cooperation path planning multimodal multi-objective evo-lutionary algorithm deep reinforcement learning
原文传递
Deep Reinforcement Learning Model for Blood Bank Vehicle Routing Multi-Objective Optimization 被引量:3
10
作者 Meteb M.Altaf Ahmed Samir Roshdy Hatoon S.AlSagri 《Computers, Materials & Continua》 SCIE EI 2022年第2期3955-3967,共13页
The overall healthcare system has been prioritized within development top lists worldwide.Since many national populations are aging,combined with the availability of sophisticated medical treatments,healthcare expendi... The overall healthcare system has been prioritized within development top lists worldwide.Since many national populations are aging,combined with the availability of sophisticated medical treatments,healthcare expenditures are rapidly growing.Blood banks are a major component of any healthcare system,which store and provide the blood products needed for organ transplants,emergency medical treatments,and routine surgeries.Timely delivery of blood products is vital,especially in emergency settings.Hence,blood delivery process parameters such as safety and speed have received attention in the literature,as well as other parameters such as delivery cost.In this paper,delivery time and cost are modeled mathematically and marked as objective functions requiring simultaneous optimization.A solution is proposed based on Deep Reinforcement Learning(DRL)to address the formulated delivery functions as Multi-objective Optimization Problems(MOPs).The basic concept of the solution is to decompose the MOP into a scalar optimization sub-problems set,where each one of these sub-problems is modeled as a separate Neural Network(NN).The overall model parameters for each sub-problem are optimized based on a neighborhood parameter transfer and DRL training algorithm.The optimization step for the subproblems is undertaken collaboratively to optimize the overall model.Paretooptimal solutions can be directly obtained using the trained NN.Specifically,the multi-objective blood bank delivery problem is addressed in this research.Onemajor technical advantage of this approach is that once the trainedmodel is available,it can be scaled without the need formodel retraining.The scoring can be obtained directly using a straightforward computation of the NN layers in a limited time.The proposed technique provides a set of technical strength points such as the ability to generalize and solve rapidly compared to othermulti-objective optimizationmethods.The model was trained and tested on 5 major hospitals in Saudi Arabia’s Riyadh region,and the simulation results indicated that time and cost decreased by 35%and 30%,respectively.In particular,the proposed model outperformed other state-of-the-art MOP solutions such as Genetic Algorithms and Simulated Annealing. 展开更多
关键词 OPTIMIZATION blood bank deep neural network reinforcement learning blood centers multi-objective optimization
在线阅读 下载PDF
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection
11
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(DRL) evolutionary algorithms evolutionary operator selection
在线阅读 下载PDF
Enhancing Hyper-Spectral Image Classification with Reinforcement Learning and Advanced Multi-Objective Binary Grey Wolf Optimization
12
作者 Mehrdad Shoeibi Mohammad Mehdi Sharifi Nevisi +3 位作者 Reza Salehi Diego Martín Zahra Halimi Sahba Baniasadi 《Computers, Materials & Continua》 SCIE EI 2024年第6期3469-3493,共25页
Hyperspectral(HS)image classification plays a crucial role in numerous areas including remote sensing(RS),agriculture,and the monitoring of the environment.Optimal band selection in HS images is crucial for improving ... Hyperspectral(HS)image classification plays a crucial role in numerous areas including remote sensing(RS),agriculture,and the monitoring of the environment.Optimal band selection in HS images is crucial for improving the efficiency and accuracy of image classification.This process involves selecting the most informative spectral bands,which leads to a reduction in data volume.Focusing on these key bands also enhances the accuracy of classification algorithms,as redundant or irrelevant bands,which can introduce noise and lower model performance,are excluded.In this paper,we propose an approach for HS image classification using deep Q learning(DQL)and a novel multi-objective binary grey wolf optimizer(MOBGWO).We investigate the MOBGWO for optimal band selection to further enhance the accuracy of HS image classification.In the suggested MOBGWO,a new sigmoid function is introduced as a transfer function to modify the wolves’position.The primary objective of this classification is to reduce the number of bands while maximizing classification accuracy.To evaluate the effectiveness of our approach,we conducted experiments on publicly available HS image datasets,including Pavia University,Washington Mall,and Indian Pines datasets.We compared the performance of our proposed method with several state-of-the-art deep learning(DL)and machine learning(ML)algorithms,including long short-term memory(LSTM),deep neural network(DNN),recurrent neural network(RNN),support vector machine(SVM),and random forest(RF).Our experimental results demonstrate that the Hybrid MOBGWO-DQL significantly improves classification accuracy compared to traditional optimization and DL techniques.MOBGWO-DQL shows greater accuracy in classifying most categories in both datasets used.For the Indian Pine dataset,the MOBGWO-DQL architecture achieved a kappa coefficient(KC)of 97.68%and an overall accuracy(OA)of 94.32%.This was accompanied by the lowest root mean square error(RMSE)of 0.94,indicating very precise predictions with minimal error.In the case of the Pavia University dataset,the MOBGWO-DQL model demonstrated outstanding performance with the highest KC of 98.72%and an impressive OA of 96.01%.It also recorded the lowest RMSE at 0.63,reinforcing its accuracy in predictions.The results clearly demonstrate that the proposed MOBGWO-DQL architecture not only reaches a highly accurate model more quickly but also maintains superior performance throughout the training process. 展开更多
关键词 Hyperspectral image classification reinforcement learning multi-objective binary grey wolf optimizer band selection
在线阅读 下载PDF
A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds
13
作者 Yao QIN Hua WANG +2 位作者 Shanwen YI Xiaole LI Linbo ZHAI 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第5期25-36,共12页
Recently,a growing number of scientific applications have been migrated into the cloud.To deal with the problems brought by clouds,more and more researchers start to consider multiple optimization goals in workflow sc... Recently,a growing number of scientific applications have been migrated into the cloud.To deal with the problems brought by clouds,more and more researchers start to consider multiple optimization goals in workflow scheduling.However,the previous works ignore some details,which are challenging but essential.Most existing multi-objective work-flow scheduling algorithms overlook weight selection,which may result in the quality degradation of solutions.Besides,we find that the famous partial critical path(PCP)strategy,which has been widely used to meet the deadline constraint,can not accurately reflect the situation of each time step.Work-flow scheduling is an NP-hard problem,so self-optimizing algorithms are more suitable to solve it.In this paper,the aim is to solve a workflow scheduling problem with a deadline constraint.We design a deadline constrained scientific workflow scheduling algorithm based on multi-objective reinforcement learning(RL)called DCMORL.DCMORL uses the Chebyshev scalarization function to scalarize its Q-values.This method is good at choosing weights for objectives.We propose an improved version of the PCP strategy called MPCP.The sub-deadlines in MPCP regularly update during the scheduling phase,so they can accurately reflect the situation of each time step.The optimization objectives in this paper include minimizing the execution cost and energy consumption within a given deadline.Finally,we use four scientific workflows to compare DCMORL and several representa-tive scheduling algorithms.The results indicate that DCMORL outperforms the above algorithms.As far as we know,it is the first time to apply RL to a deadline constrained workflow scheduling problem. 展开更多
关键词 workflow scheduling energy saving multiobjective reinforcement learning deadline constrained cloud computing
原文传递
Rule-Guidance Reinforcement Learning for Lane Change Decision-making:A Risk Assessment Approach 被引量:1
14
作者 Lu Xiong Zhuoren Li +2 位作者 Danyang Zhong Puhang Xu Chen Tang 《Chinese Journal of Mechanical Engineering》 2025年第2期344-359,共16页
To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforce... To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN. 展开更多
关键词 Autonomous driving reinforcement learning DECISION-MAKING Risk assessment Safety filter
在线阅读 下载PDF
A Survey of Cooperative Multi-agent Reinforcement Learning for Multi-task Scenarios 被引量:1
15
作者 Jiajun CHAI Zijie ZHAO +1 位作者 Yuanheng ZHU Dongbin ZHAO 《Artificial Intelligence Science and Engineering》 2025年第2期98-121,共24页
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-... Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world. 展开更多
关键词 MULTI-TASK multi-agent reinforcement learning large language models
在线阅读 下载PDF
Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs 被引量:2
16
作者 Bocheng ZHAO Mingying HUO +4 位作者 Zheng LI Wenyu FENG Ze YU Naiming QI Shaohai WANG 《Chinese Journal of Aeronautics》 2025年第3期109-123,共15页
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj... This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments. 展开更多
关键词 Unmanned aerial vehicle(UAV) Multi-agent reinforcement learning(MARL) Graph attention network(GAT) Tracking Dynamic and unknown environment
原文传递
Deep reinforcement learning based integrated evasion and impact hierarchical intelligent policy of exo-atmospheric vehicles 被引量:1
17
作者 Leliang REN Weilin GUO +3 位作者 Yong XIAN Zhenyu LIU Daqiao ZHANG Shaopeng LI 《Chinese Journal of Aeronautics》 2025年第1期409-426,共18页
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u... Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value. 展开更多
关键词 Exo-atmospheric vehicle Integrated evasion and impact Deep reinforcement learning Hierarchical intelligent policy Single-chip microcomputer Miss distance
原文传递
Multi-QoS routing algorithm based on reinforcement learning for LEO satellite networks 被引量:1
18
作者 ZHANG Yifan DONG Tao +1 位作者 LIU Zhihui JIN Shichao 《Journal of Systems Engineering and Electronics》 2025年第1期37-47,共11页
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa... Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link. 展开更多
关键词 low Earth orbit(LEO)satellite network reinforcement learning multi-quality of service(QoS) routing algorithm
在线阅读 下载PDF
Intelligent path planning for small modular reactors based on improved reinforcement learning
19
作者 DONG Yun-Feng ZHOU Wei-Zheng +1 位作者 WANG Zhe-Zheng ZHANG Xiao 《四川大学学报(自然科学版)》 北大核心 2025年第4期1006-1014,共9页
Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous... Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future. 展开更多
关键词 Small modular reactor Operating condition recognition Path planning reinforcement learning
在线阅读 下载PDF
Performance comparison of deep reinforcement robot-arm learning in sequential fabrication of rule-based building design form
20
作者 Abhishek Mehrotra Hwang Yi 《Frontiers of Architectural Research》 2025年第6期1654-1680,共27页
Deep reinforcement learning(DRL)remains underexplored within architectural robotics,particularly in relation to self-learning of architectural design principles and designaware robotic fabrication.To address this gap,... Deep reinforcement learning(DRL)remains underexplored within architectural robotics,particularly in relation to self-learning of architectural design principles and designaware robotic fabrication.To address this gap,we applied established DRL methods to enable robot arms to autonomously learn design rules in a pilot block wall assembly-design scenario.Recognizing the complexity inherent in such learning tasks,the problem was strategically decomposed into two sub-tasks:(i)target reaching(T1),modeled within a continuous action space,and(ii)sequential planning(T2),formulated within a discrete action space.For T1,we evaluated major DRL algorithms―Proximal Policy Optimization(PPO),Advantage Actor-Critic(A2C),Deep Deterministic Policy Gradient,Twin Delayed Deep Deterministic Policy Gradient,and Soft Actor-Critic(SAC),and PPO,A2C,and Double Deep Q-Network(DDQN)were tested for T2.Performance was assessed based on training efficacy,reliability,and two novel metrics:degree index and variation index.Our results revealed that SAC was the best for T1,whereas DDQN excelled in T2.Notably,DDQN exhibited strong learning adaptability,yielding diverse final layouts in response to varying initial conditions. 展开更多
关键词 Robotic architecture Robot learning reinforcement learning Robotic construction Robot arm
原文传递
上一页 1 2 250 下一页 到第
使用帮助 返回顶部