Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements...Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements.Deep reinforcement learning(DRL)exhibits excellent capability of real-time decision-making and adaptability to complex scenarios,and generalization abilities.However,it is arduous to guarantee complete driving safety and efficiency under the constraints of training samples and costs.This paper proposes a Mixture of Expert method(MoE)based on Soft Actor-Critic(SAC),where the upper-level discriminator dynamically decides whether to activate the lower-level DRL expert or the heuristic expert based on the features of the input state.To further enhance the performance of the DRL expert,a buffer zone is introduced in the reward function,preemptively applying penalties before insecure situations occur.In order to minimize collision and off-road rates,the Intelligent Driver Model(IDM)and Minimizing Overall Braking Induced by Lane changes(MOBIL)strategy are designed by heuristic experts.Finally,tested in typical simulation scenarios,MOE shows a 13.75%improvement in driving efficiency compared with the traditional DRL method with continuous action space.It ensures high safety with zero collision and zero off-road rates while maintaining high adaptability.展开更多
Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, w...Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.展开更多
Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the powe...Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests.Hence,this paper proposes a soft actor-critic algorithm incorporating distributed trading control(SAC-DTC)to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers.First,the soft actor-critic(SAC)algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost,and the primary environmental information of the ADN at this point is published to prosumers.Then,a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues.Subsequently,the results of trading are encrypted based on the differential privacy technique and returned to the ADN.Finally,the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning.Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost,boosts the P2P market revenue,maximizes the social welfare,and exhibits high computational accuracy,demonstrating its practical application to the operation of power systems and power markets.展开更多
Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency...Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency mainly lie in the renewable generation uncertainty and operational non-convexity of combined heat and power(CHP)units.To this end,this paper proposes a soft actor-critic(SAC)algorithm to solve the scheduling problem of BIES,which overcomes the model non-convexity and shows advantages in robustness and generalization.This paper also adopts a temporal fusion transformer(TFT)to enhance the optimal solution for the SAC algorithm by forecasting the renewable generation and energy demand.The TFT can effectively capture the complex temporal patterns and dependencies that span multiple steps.Furthermore,its forecasting results are interpretable due to the employment of a self-attention layer so as to assist in more trustworthy decision-making in the SAC algorithm.The proposed hybrid data-driven approach integrating TFT and SAC algorithm,i.e.,TFT-SAC approach,is trained and tested on a real-world dataset to validate its superior performance in reducing the energy cost and computational time compared with the benchmark approaches.The generalization performance for the scheduling policy,as well as the sensitivity analysis,are examined in the case studies.展开更多
Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adja...Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adjacent parking lots, which poses a safety threat to vehicles parked in these parking lots. However, previous studies have not addressed this issue. In this paper, we aim to evaluate the impact of parking deviation of existing vehicles next to the target parking lot(PDEVNTPL) on the automatic ego vehicle(AEV) parking, in terms of safety, comfort, accuracy, and efficiency of parking. A segmented parking training framework(SPTF) based on soft actor-critic(SAC) is proposed to improve parking performance. In the proposed method, the SAC algorithm incorporates strategy entropy into the objective function, to enable the AEV to learn parking strategies based on a more comprehensive understanding of the environment. Additionally, the SPTF simplifies complex parking tasks to maintain the high performance of deep reinforcement learning(DRL). The experimental results reveal that the PDEVNTPL has a detrimental influence on the AEV parking in terms of safety, accuracy, and comfort, leading to reductions of more than 27%, 54%, and 26%respectively. However, the SAC-based SPTF effectively mitigates this impact, resulting in a considerable increase in the parking success rate from 71% to 93%. Furthermore, the heading angle deviation is significantly reduced from 2.25 degrees to 0.43degrees.展开更多
Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱...Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱动的强化学习范式,强调从静态样本数据集中学习策略,与环境无探索交互,为机器人、自动驾驶、健康护理等真实世界部署应用提供了可行的解决方案,是近年来的研究热点.目前,离线强化学习方法存在学习策略和行为策略之间的分布偏移挑战,针对这个挑战,通常采用策略约束或值函数正则化来限制访问数据集分布之外(Out-Of-Distribution,OOD)的动作,从而导致学习性能过于保守,阻碍了值函数网络的泛化和学习策略的性能提升.为此,本文利用不确定性估计和OOD采样来平衡值函数学习的泛化性和保守性,提出一种基于不确定性估计的离线确定型Actor-Critic方法(Offline Deterministic Actor-Critic based on UncertaintyEstimation,ODACUE).首先,针对确定型策略,给出一种Q值函数的不确定性估计算子定义,理论证明了该算子学到的Q值函数是最优Q值函数的一种悲观估计.然后,将不确定性估计算子应用于确定型Actor-Critic框架中,通过对不确定性估计算子进行凸组合构造Critic学习的目标函数.最后,D4RL基准数据集任务上的实验结果表明:相较于对比算法,ODACUE在11个不同质量等级数据集任务中的总体性能提升最低达9.56%,最高达64.92%.此外,参数分析和消融实验进一步验证了ODACUE的稳定性和泛化能力.展开更多
In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of...In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of QoS can be defined as,for example,average latency,packet loss ratio,and throughput.The SDN controller can use network statistics and a Deep Reinforcement Learning(DRL)method to resolve this challenge.In this paper,we formulate dynamic routing in an SDN as a Markov decision process and propose a DRL algorithm called the Asynchronous Advantage Actor-Critic QoS-aware Routing Optimization Mechanism(AQROM)to determine routing strategies that balance the traffic loads in the network.AQROM can improve the QoS of the network and reduce the training time via dynamic routing strategy updates;that is,the reward function can be dynamically and promptly altered based on the optimization objective regardless of the network topology and traffic pattern.AQROM can be considered as one-step optimization and a black-box routing mechanism in high-dimensional input and output sets for both discrete and continuous states,and actions with respect to the operations in the SDN.Extensive simulations were conducted using OMNeT++and the results demonstrated that AQROM 1)achieved much faster and stable convergence than the Deep Deterministic Policy Gradient(DDPG)and Advantage Actor-Critic(A2C),2)incurred a lower packet loss ratio and latency than Open Shortest Path First(OSPF),DDPG,and A2C,and 3)resulted in higher and more stable throughput than OSPF,DDPG,and A2C.展开更多
In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-in...In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-integrated DRL algorithm guides the RVPP to participate effectively in both the day-ahead and real-time markets,lowering the electricity purchase costs and consumption risks for end-users.The Lagrangian relaxation technique is introduced to transform the constrained Markov decision process(CMDP)into an unconstrained optimization problem,which guarantees that the constraints are strictly satisfied without determining the penalty coefficients.Furthermore,to enhance the scalability of the constrained soft actor-critic(CSAC)-based RVPP scheduling approach,a fully distributed scheduling architecture was designed to enable plug-and-play in the residential distributed energy resources(RDER).Case studies performed on the constructed RVPP scenario validated the performance of the proposed methodology in enhancing the responsiveness of the RDER to power tariffs,balancing the supply and demand of the power grid,and ensuring customer comfort.展开更多
Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,curr...Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,current SOH estimation methods often overlook the valuable temperature information that can effectively characterize battery aging during capacity degradation.Additionally,the Elman neural network,which is commonly employed for SOH estimation,exhibits several drawbacks,including slow training speed,a tendency to become trapped in local minima,and the initialization of weights and thresholds using pseudo-random numbers,leading to unstable model performance.To address these issues,this study addresses the challenge of precise and effective SOH detection by proposing a method for estimating the SOH of lithium-ion batteries based on differential thermal voltammetry(DTV)and an SSA-Elman neural network.Firstly,two health features(HFs)considering temperature factors and battery voltage are extracted fromthe differential thermal voltammetry curves and incremental capacity curves.Next,the Sparrow Search Algorithm(SSA)is employed to optimize the initial weights and thresholds of the Elman neural network,forming the SSA-Elman neural network model.To validate the performance,various neural networks,including the proposed SSA-Elman network,are tested using the Oxford battery aging dataset.The experimental results demonstrate that the method developed in this study achieves superior accuracy and robustness,with a mean absolute error(MAE)of less than 0.9%and a rootmean square error(RMSE)below 1.4%.展开更多
Complex network models are frequently employed for simulating and studyingdiverse real-world complex systems.Among these models,scale-free networks typically exhibit greater fragility to malicious attacks.Consequently...Complex network models are frequently employed for simulating and studyingdiverse real-world complex systems.Among these models,scale-free networks typically exhibit greater fragility to malicious attacks.Consequently,enhancing the robustness of scale-free networks has become a pressing issue.To address this problem,this paper proposes a Multi-Granularity Integration Algorithm(MGIA),which aims to improve the robustness of scale-free networks while keeping the initial degree of each node unchanged,ensuring network connectivity and avoiding the generation of multiple edges.The algorithm generates a multi-granularity structure from the initial network to be optimized,then uses different optimization strategies to optimize the networks at various granular layers in this structure,and finally realizes the information exchange between different granular layers,thereby further enhancing the optimization effect.We propose new network refresh,crossover,and mutation operators to ensure that the optimized network satisfies the given constraints.Meanwhile,we propose new network similarity and network dissimilarity evaluation metrics to improve the effectiveness of the optimization operators in the algorithm.In the experiments,the MGIA enhances the robustness of the scale-free network by 67.6%.This improvement is approximately 17.2%higher than the optimization effects achieved by eight currently existing complex network robustness optimization algorithms.展开更多
Accurate short-term wind power forecast technique plays a crucial role in maintaining the safety and economic efficiency of smart grids.Although numerous studies have employed various methods to forecast wind power,th...Accurate short-term wind power forecast technique plays a crucial role in maintaining the safety and economic efficiency of smart grids.Although numerous studies have employed various methods to forecast wind power,there remains a research gap in leveraging swarm intelligence algorithms to optimize the hyperparameters of the Transformer model for wind power prediction.To improve the accuracy of short-term wind power forecast,this paper proposes a hybrid short-term wind power forecast approach named STL-IAOA-iTransformer,which is based on seasonal and trend decomposition using LOESS(STL)and iTransformer model optimized by improved arithmetic optimization algorithm(IAOA).First,to fully extract the power data features,STL is used to decompose the original data into components with less redundant information.The extracted components as well as the weather data are then input into iTransformer for short-term wind power forecast.The final predicted short-term wind power curve is obtained by combining the predicted components.To improve the model accuracy,IAOA is employed to optimize the hyperparameters of iTransformer.The proposed approach is validated using real-generation data from different seasons and different power stations inNorthwest China,and ablation experiments have been conducted.Furthermore,to validate the superiority of the proposed approach under different wind characteristics,real power generation data fromsouthwestChina are utilized for experiments.Thecomparative results with the other six state-of-the-art prediction models in experiments show that the proposed model well fits the true value of generation series and achieves high prediction accuracy.展开更多
In disaster relief operations,multiple UAVs can be used to search for trapped people.In recent years,many researchers have proposed machine le arning-based algorithms,sampling-based algorithms,and heuristic algorithms...In disaster relief operations,multiple UAVs can be used to search for trapped people.In recent years,many researchers have proposed machine le arning-based algorithms,sampling-based algorithms,and heuristic algorithms to solve the problem of multi-UAV path planning.The Dung Beetle Optimization(DBO)algorithm has been widely applied due to its diverse search patterns in the above algorithms.However,the update strategies for the rolling and thieving dung beetles of the DBO algorithm are overly simplistic,potentially leading to an inability to fully explore the search space and a tendency to converge to local optima,thereby not guaranteeing the discovery of the optimal path.To address these issues,we propose an improved DBO algorithm guided by the Landmark Operator(LODBO).Specifically,we first use tent mapping to update the population strategy,which enables the algorithm to generate initial solutions with enhanced diversity within the search space.Second,we expand the search range of the rolling ball dung beetle by using the landmark factor.Finally,by using the adaptive factor that changes with the number of iterations.,we improve the global search ability of the stealing dung beetle,making it more likely to escape from local optima.To verify the effectiveness of the proposed method,extensive simulation experiments are conducted,and the result shows that the LODBO algorithm can obtain the optimal path using the shortest time compared with the Genetic Algorithm(GA),the Gray Wolf Optimizer(GWO),the Whale Optimization Algorithm(WOA)and the original DBO algorithm in the disaster search and rescue task set.展开更多
基金Supported by National Key R&D Program of China(Grant No.2022YFB2503203)National Natural Science Foundation of China(Grant No.U1964206).
文摘Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements.Deep reinforcement learning(DRL)exhibits excellent capability of real-time decision-making and adaptability to complex scenarios,and generalization abilities.However,it is arduous to guarantee complete driving safety and efficiency under the constraints of training samples and costs.This paper proposes a Mixture of Expert method(MoE)based on Soft Actor-Critic(SAC),where the upper-level discriminator dynamically decides whether to activate the lower-level DRL expert or the heuristic expert based on the features of the input state.To further enhance the performance of the DRL expert,a buffer zone is introduced in the reward function,preemptively applying penalties before insecure situations occur.In order to minimize collision and off-road rates,the Intelligent Driver Model(IDM)and Minimizing Overall Braking Induced by Lane changes(MOBIL)strategy are designed by heuristic experts.Finally,tested in typical simulation scenarios,MOE shows a 13.75%improvement in driving efficiency compared with the traditional DRL method with continuous action space.It ensures high safety with zero collision and zero off-road rates while maintaining high adaptability.
基金supported by National Key Research and Development Program of China(NO.2018AAA0103003)National Natural Science Foundation of China(NO.61773378)+1 种基金Basic Research Program(NO.JCKY*******B029)Strategic Priority Research Program of Chinese Academy of Science(NO.XDB32050100).
文摘Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.
基金supported by the National Natural Science Foundation of China(No.52177085).
文摘Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests.Hence,this paper proposes a soft actor-critic algorithm incorporating distributed trading control(SAC-DTC)to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers.First,the soft actor-critic(SAC)algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost,and the primary environmental information of the ADN at this point is published to prosumers.Then,a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues.Subsequently,the results of trading are encrypted based on the differential privacy technique and returned to the ADN.Finally,the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning.Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost,boosts the P2P market revenue,maximizes the social welfare,and exhibits high computational accuracy,demonstrating its practical application to the operation of power systems and power markets.
文摘Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency mainly lie in the renewable generation uncertainty and operational non-convexity of combined heat and power(CHP)units.To this end,this paper proposes a soft actor-critic(SAC)algorithm to solve the scheduling problem of BIES,which overcomes the model non-convexity and shows advantages in robustness and generalization.This paper also adopts a temporal fusion transformer(TFT)to enhance the optimal solution for the SAC algorithm by forecasting the renewable generation and energy demand.The TFT can effectively capture the complex temporal patterns and dependencies that span multiple steps.Furthermore,its forecasting results are interpretable due to the employment of a self-attention layer so as to assist in more trustworthy decision-making in the SAC algorithm.The proposed hybrid data-driven approach integrating TFT and SAC algorithm,i.e.,TFT-SAC approach,is trained and tested on a real-world dataset to validate its superior performance in reducing the energy cost and computational time compared with the benchmark approaches.The generalization performance for the scheduling policy,as well as the sensitivity analysis,are examined in the case studies.
基金supported by National Natural Science Foundation of China(52222215, 52272420, 52072051)。
文摘Parking in a small parking lot within limited space poses a difficult task. It often leads to deviations between the final parking posture and the target posture. These deviations can lead to partial occupancy of adjacent parking lots, which poses a safety threat to vehicles parked in these parking lots. However, previous studies have not addressed this issue. In this paper, we aim to evaluate the impact of parking deviation of existing vehicles next to the target parking lot(PDEVNTPL) on the automatic ego vehicle(AEV) parking, in terms of safety, comfort, accuracy, and efficiency of parking. A segmented parking training framework(SPTF) based on soft actor-critic(SAC) is proposed to improve parking performance. In the proposed method, the SAC algorithm incorporates strategy entropy into the objective function, to enable the AEV to learn parking strategies based on a more comprehensive understanding of the environment. Additionally, the SPTF simplifies complex parking tasks to maintain the high performance of deep reinforcement learning(DRL). The experimental results reveal that the PDEVNTPL has a detrimental influence on the AEV parking in terms of safety, accuracy, and comfort, leading to reductions of more than 27%, 54%, and 26%respectively. However, the SAC-based SPTF effectively mitigates this impact, resulting in a considerable increase in the parking success rate from 71% to 93%. Furthermore, the heading angle deviation is significantly reduced from 2.25 degrees to 0.43degrees.
文摘Actor-Critic是一种强化学习方法,通过与环境在线试错交互收集样本来学习策略,是求解序贯感知决策问题的有效手段.但是,这种在线交互的主动学习范式在一些复杂真实环境中收集样本时会带来成本和安全问题离线强化学习作为一种基于数据驱动的强化学习范式,强调从静态样本数据集中学习策略,与环境无探索交互,为机器人、自动驾驶、健康护理等真实世界部署应用提供了可行的解决方案,是近年来的研究热点.目前,离线强化学习方法存在学习策略和行为策略之间的分布偏移挑战,针对这个挑战,通常采用策略约束或值函数正则化来限制访问数据集分布之外(Out-Of-Distribution,OOD)的动作,从而导致学习性能过于保守,阻碍了值函数网络的泛化和学习策略的性能提升.为此,本文利用不确定性估计和OOD采样来平衡值函数学习的泛化性和保守性,提出一种基于不确定性估计的离线确定型Actor-Critic方法(Offline Deterministic Actor-Critic based on UncertaintyEstimation,ODACUE).首先,针对确定型策略,给出一种Q值函数的不确定性估计算子定义,理论证明了该算子学到的Q值函数是最优Q值函数的一种悲观估计.然后,将不确定性估计算子应用于确定型Actor-Critic框架中,通过对不确定性估计算子进行凸组合构造Critic学习的目标函数.最后,D4RL基准数据集任务上的实验结果表明:相较于对比算法,ODACUE在11个不同质量等级数据集任务中的总体性能提升最低达9.56%,最高达64.92%.此外,参数分析和消融实验进一步验证了ODACUE的稳定性和泛化能力.
基金fully supported by GUET Excellent Graduate Thesis Program(Grant No.19YJPYBS03)Innovation Project of Guangxi Graduate Education(Grant No.YCBZ2022109)New Technology Research University Cooperation Project of the 34th Research Institute of China Electronics Technology Group Corporation,2021(Grant No.SF2126007)。
文摘In Software-Defined Networks(SDNs),determining how to efficiently achieve Quality of Service(QoS)-aware routing is challenging but critical for significantly improving the performance of a network,where the metrics of QoS can be defined as,for example,average latency,packet loss ratio,and throughput.The SDN controller can use network statistics and a Deep Reinforcement Learning(DRL)method to resolve this challenge.In this paper,we formulate dynamic routing in an SDN as a Markov decision process and propose a DRL algorithm called the Asynchronous Advantage Actor-Critic QoS-aware Routing Optimization Mechanism(AQROM)to determine routing strategies that balance the traffic loads in the network.AQROM can improve the QoS of the network and reduce the training time via dynamic routing strategy updates;that is,the reward function can be dynamically and promptly altered based on the optimization objective regardless of the network topology and traffic pattern.AQROM can be considered as one-step optimization and a black-box routing mechanism in high-dimensional input and output sets for both discrete and continuous states,and actions with respect to the operations in the SDN.Extensive simulations were conducted using OMNeT++and the results demonstrated that AQROM 1)achieved much faster and stable convergence than the Deep Deterministic Policy Gradient(DDPG)and Advantage Actor-Critic(A2C),2)incurred a lower packet loss ratio and latency than Open Shortest Path First(OSPF),DDPG,and A2C,and 3)resulted in higher and more stable throughput than OSPF,DDPG,and A2C.
基金supported by the Sichuan Science and Technology Program(grant number 2022YFG0123).
文摘In this study,a novel residential virtual power plant(RVPP)scheduling method that leverages a gate recurrent unit(GRU)-integrated deep reinforcement learning(DRL)algorithm is proposed.In the proposed scheme,the GRU-integrated DRL algorithm guides the RVPP to participate effectively in both the day-ahead and real-time markets,lowering the electricity purchase costs and consumption risks for end-users.The Lagrangian relaxation technique is introduced to transform the constrained Markov decision process(CMDP)into an unconstrained optimization problem,which guarantees that the constraints are strictly satisfied without determining the penalty coefficients.Furthermore,to enhance the scalability of the constrained soft actor-critic(CSAC)-based RVPP scheduling approach,a fully distributed scheduling architecture was designed to enable plug-and-play in the residential distributed energy resources(RDER).Case studies performed on the constructed RVPP scenario validated the performance of the proposed methodology in enhancing the responsiveness of the RDER to power tariffs,balancing the supply and demand of the power grid,and ensuring customer comfort.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant(No.51677058).
文摘Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,current SOH estimation methods often overlook the valuable temperature information that can effectively characterize battery aging during capacity degradation.Additionally,the Elman neural network,which is commonly employed for SOH estimation,exhibits several drawbacks,including slow training speed,a tendency to become trapped in local minima,and the initialization of weights and thresholds using pseudo-random numbers,leading to unstable model performance.To address these issues,this study addresses the challenge of precise and effective SOH detection by proposing a method for estimating the SOH of lithium-ion batteries based on differential thermal voltammetry(DTV)and an SSA-Elman neural network.Firstly,two health features(HFs)considering temperature factors and battery voltage are extracted fromthe differential thermal voltammetry curves and incremental capacity curves.Next,the Sparrow Search Algorithm(SSA)is employed to optimize the initial weights and thresholds of the Elman neural network,forming the SSA-Elman neural network model.To validate the performance,various neural networks,including the proposed SSA-Elman network,are tested using the Oxford battery aging dataset.The experimental results demonstrate that the method developed in this study achieves superior accuracy and robustness,with a mean absolute error(MAE)of less than 0.9%and a rootmean square error(RMSE)below 1.4%.
基金National Natural Science Foundation of China(11971211,12171388).
文摘Complex network models are frequently employed for simulating and studyingdiverse real-world complex systems.Among these models,scale-free networks typically exhibit greater fragility to malicious attacks.Consequently,enhancing the robustness of scale-free networks has become a pressing issue.To address this problem,this paper proposes a Multi-Granularity Integration Algorithm(MGIA),which aims to improve the robustness of scale-free networks while keeping the initial degree of each node unchanged,ensuring network connectivity and avoiding the generation of multiple edges.The algorithm generates a multi-granularity structure from the initial network to be optimized,then uses different optimization strategies to optimize the networks at various granular layers in this structure,and finally realizes the information exchange between different granular layers,thereby further enhancing the optimization effect.We propose new network refresh,crossover,and mutation operators to ensure that the optimized network satisfies the given constraints.Meanwhile,we propose new network similarity and network dissimilarity evaluation metrics to improve the effectiveness of the optimization operators in the algorithm.In the experiments,the MGIA enhances the robustness of the scale-free network by 67.6%.This improvement is approximately 17.2%higher than the optimization effects achieved by eight currently existing complex network robustness optimization algorithms.
基金supported by Yunnan Provincial Basic Research Project(202401AT070344,202301AT070443)National Natural Science Foundation of China(62263014,52207105)+1 种基金Yunnan Lancang-Mekong International Electric Power Technology Joint Laboratory(202203AP140001)Major Science and Technology Projects in Yunnan Province(202402AG050006).
文摘Accurate short-term wind power forecast technique plays a crucial role in maintaining the safety and economic efficiency of smart grids.Although numerous studies have employed various methods to forecast wind power,there remains a research gap in leveraging swarm intelligence algorithms to optimize the hyperparameters of the Transformer model for wind power prediction.To improve the accuracy of short-term wind power forecast,this paper proposes a hybrid short-term wind power forecast approach named STL-IAOA-iTransformer,which is based on seasonal and trend decomposition using LOESS(STL)and iTransformer model optimized by improved arithmetic optimization algorithm(IAOA).First,to fully extract the power data features,STL is used to decompose the original data into components with less redundant information.The extracted components as well as the weather data are then input into iTransformer for short-term wind power forecast.The final predicted short-term wind power curve is obtained by combining the predicted components.To improve the model accuracy,IAOA is employed to optimize the hyperparameters of iTransformer.The proposed approach is validated using real-generation data from different seasons and different power stations inNorthwest China,and ablation experiments have been conducted.Furthermore,to validate the superiority of the proposed approach under different wind characteristics,real power generation data fromsouthwestChina are utilized for experiments.Thecomparative results with the other six state-of-the-art prediction models in experiments show that the proposed model well fits the true value of generation series and achieves high prediction accuracy.
基金supported by the National Natural Science Foundation of China(No.62373027).
文摘In disaster relief operations,multiple UAVs can be used to search for trapped people.In recent years,many researchers have proposed machine le arning-based algorithms,sampling-based algorithms,and heuristic algorithms to solve the problem of multi-UAV path planning.The Dung Beetle Optimization(DBO)algorithm has been widely applied due to its diverse search patterns in the above algorithms.However,the update strategies for the rolling and thieving dung beetles of the DBO algorithm are overly simplistic,potentially leading to an inability to fully explore the search space and a tendency to converge to local optima,thereby not guaranteeing the discovery of the optimal path.To address these issues,we propose an improved DBO algorithm guided by the Landmark Operator(LODBO).Specifically,we first use tent mapping to update the population strategy,which enables the algorithm to generate initial solutions with enhanced diversity within the search space.Second,we expand the search range of the rolling ball dung beetle by using the landmark factor.Finally,by using the adaptive factor that changes with the number of iterations.,we improve the global search ability of the stealing dung beetle,making it more likely to escape from local optima.To verify the effectiveness of the proposed method,extensive simulation experiments are conducted,and the result shows that the LODBO algorithm can obtain the optimal path using the shortest time compared with the Genetic Algorithm(GA),the Gray Wolf Optimizer(GWO),the Whale Optimization Algorithm(WOA)and the original DBO algorithm in the disaster search and rescue task set.