Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements...Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements.Deep reinforcement learning(DRL)exhibits excellent capability of real-time decision-making and adaptability to complex scenarios,and generalization abilities.However,it is arduous to guarantee complete driving safety and efficiency under the constraints of training samples and costs.This paper proposes a Mixture of Expert method(MoE)based on Soft Actor-Critic(SAC),where the upper-level discriminator dynamically decides whether to activate the lower-level DRL expert or the heuristic expert based on the features of the input state.To further enhance the performance of the DRL expert,a buffer zone is introduced in the reward function,preemptively applying penalties before insecure situations occur.In order to minimize collision and off-road rates,the Intelligent Driver Model(IDM)and Minimizing Overall Braking Induced by Lane changes(MOBIL)strategy are designed by heuristic experts.Finally,tested in typical simulation scenarios,MOE shows a 13.75%improvement in driving efficiency compared with the traditional DRL method with continuous action space.It ensures high safety with zero collision and zero off-road rates while maintaining high adaptability.展开更多
Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, w...Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.展开更多
Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the powe...Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests.Hence,this paper proposes a soft actor-critic algorithm incorporating distributed trading control(SAC-DTC)to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers.First,the soft actor-critic(SAC)algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost,and the primary environmental information of the ADN at this point is published to prosumers.Then,a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues.Subsequently,the results of trading are encrypted based on the differential privacy technique and returned to the ADN.Finally,the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning.Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost,boosts the P2P market revenue,maximizes the social welfare,and exhibits high computational accuracy,demonstrating its practical application to the operation of power systems and power markets.展开更多
Wireless Sensor Networks(WSNs)play a crucial role in numerous Internet of Things(IoT)applications and next-generation communication systems,yet they continue to face challenges in balancing energy efficiency and relia...Wireless Sensor Networks(WSNs)play a crucial role in numerous Internet of Things(IoT)applications and next-generation communication systems,yet they continue to face challenges in balancing energy efficiency and reliable connectivity.This study proposes SAC-HTC(Soft Actor-Critic-based High-performance Topology Control),a deep reinforcement learning(DRL)method based on the Actor-Critic framework,implemented within a Software Defined Wireless Sensor Network(SDWSN)architecture.In this approach,sensor nodes periodically transmit state information,including coordinates,node degree,transmission power,and neighbor lists,to a centralized controller.The controller acts as the reinforcement learning(RL)agent,with the Actor generating decisions to adjust transmission ranges,while the Critic evaluates action values to reflect the overall network performance.The bidirectional Node-Controller feedback mechanism enables the controller to issue appropriate control commands to each node,ensuring the maintenance of the desired node degree,reducing energy consumption,and preserving network connectivity.The algorithmfurther incorporates soft entropy adjustment to balance exploration and exploitation,alongwith an off-policy mechanism for efficient data reuse,making it well-suited to the resource-constrained conditions ofWSNs.Simulation results demonstrate that SAC-HTC not only outperforms traditional methods and several existing RL algorithms but also achieves faster convergence,optimized communication range control,global connectivity maintenance,and extended network lifetime.The key novelty of this research lies in the integration of the SAC method with the SDWSN architecture forWSNs topology control,providing an adaptive,efficient,and highly promisingmechanism for large-scale,dynamic,and high-performance sensor networks.展开更多
In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target fo...In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.展开更多
With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,exist...With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.展开更多
Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency...Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency mainly lie in the renewable generation uncertainty and operational non-convexity of combined heat and power(CHP)units.To this end,this paper proposes a soft actor-critic(SAC)algorithm to solve the scheduling problem of BIES,which overcomes the model non-convexity and shows advantages in robustness and generalization.This paper also adopts a temporal fusion transformer(TFT)to enhance the optimal solution for the SAC algorithm by forecasting the renewable generation and energy demand.The TFT can effectively capture the complex temporal patterns and dependencies that span multiple steps.Furthermore,its forecasting results are interpretable due to the employment of a self-attention layer so as to assist in more trustworthy decision-making in the SAC algorithm.The proposed hybrid data-driven approach integrating TFT and SAC algorithm,i.e.,TFT-SAC approach,is trained and tested on a real-world dataset to validate its superior performance in reducing the energy cost and computational time compared with the benchmark approaches.The generalization performance for the scheduling policy,as well as the sensitivity analysis,are examined in the case studies.展开更多
This paper presents an Eulerian-Lagrangian algorithm for direct numerical simulation(DNS)of particle-laden flows.The algorithm is applicable to perform simulations of dilute suspensions of small inertial particles in ...This paper presents an Eulerian-Lagrangian algorithm for direct numerical simulation(DNS)of particle-laden flows.The algorithm is applicable to perform simulations of dilute suspensions of small inertial particles in turbulent carrier flow.The Eulerian framework numerically resolves turbulent carrier flow using a parallelized,finite-volume DNS solver on a staggered Cartesian grid.Particles are tracked using a point-particle method utilizing a Lagrangian particle tracking(LPT)algorithm.The proposed Eulerian-Lagrangian algorithm is validated using an inertial particle-laden turbulent channel flow for different Stokes number cases.The particle concentration profiles and higher-order statistics of the carrier and dispersed phases agree well with the benchmark results.We investigated the effect of fluid velocity interpolation and numerical integration schemes of particle tracking algorithms on particle dispersion statistics.The suitability of fluid velocity interpolation schemes for predicting the particle dispersion statistics is discussed in the framework of the particle tracking algorithm coupled to the finite-volume solver.In addition,we present parallelization strategies implemented in the algorithm and evaluate their parallel performance.展开更多
Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion...Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion algorithm took advantage of the fast optimization ability of PSO to optimize the population screening link of GA.The Simulink simulation results showed that the convergence of the fitness function of the fusion algorithm was accelerated,the system response adjustment time was reduced,and the overshoot was almost zero.Then the algorithm was applied to the steering test of agricultural robot in various scenes.After modeling the steering system of agricultural robot,the steering test results in the unloaded suspended state showed that the PID control based on fusion algorithm reduced the rise time,response adjustment time and overshoot of the system,and improved the response speed and stability of the system,compared with the artificial trial and error PID control and the PID control based on GA.The actual road steering test results showed that the PID control response rise time based on the fusion algorithm was the shortest,about 4.43 s.When the target pulse number was set to 100,the actual mean value in the steady-state regulation stage was about 102.9,which was the closest to the target value among the three control methods,and the overshoot was reduced at the same time.The steering test results under various scene states showed that the PID control based on the proposed fusion algorithm had good anti-interference ability,it can adapt to the changes of environment and load and improve the performance of the control system.It was effective in the steering control of agricultural robot.This method can provide a reference for the precise steering control of other robots.展开更多
The Steiner k-eccentricity of a vertex is the maximum Steiner distance over all k-sets each of which contains the given vertex,where the Steiner distance of a vertex set is the size of a minimum Steiner tree on this s...The Steiner k-eccentricity of a vertex is the maximum Steiner distance over all k-sets each of which contains the given vertex,where the Steiner distance of a vertex set is the size of a minimum Steiner tree on this set.Since the minimum Steiner tree problem is well-known NP-hard,the Steiner k-eccentricity is not so easy to compute.This paper attempts to efficiently solve this problem on block graphs and general graphs with limited cycles.A block graph is a graph in which each block is a clique,and is also called a clique-tree.On block graphs,we propose an O(k(n+m))-time algorithm to compute the Steiner k-eccentricity of a vertex where n and m are respectively the order and size of a block graph.On general graphs with limited cycles,we take the cyclomatic numberν(G)as a parameter which is the minimum number of edges of G whose removal makes G acyclic,and devise an O(n^(ν(G)+1)(n(G)+m(G)+k))-time algorithm.展开更多
Optimization is the key to obtaining efficient utilization of resources in structural design.Due to the complex nature of truss systems,this study presents a method based on metaheuristic modelling that minimises stru...Optimization is the key to obtaining efficient utilization of resources in structural design.Due to the complex nature of truss systems,this study presents a method based on metaheuristic modelling that minimises structural weight under stress and frequency constraints.Two new algorithms,the Red Kite Optimization Algorithm(ROA)and Secretary Bird Optimization Algorithm(SBOA),are utilized on five benchmark trusses with 10,18,37,72,and 200-bar trusses.Both algorithms are evaluated against benchmarks in the literature.The results indicate that SBOA always reaches a lighter optimal.Designs with reducing structural weight ranging from 0.02%to 0.15%compared to ROA,and up to 6%–8%as compared to conventional algorithms.In addition,SBOA can achieve 15%–20%faster convergence speed and 10%–18%reduction in computational time with a smaller standard deviation over independent runs,which demonstrates its robustness and reliability.It is indicated that the adaptive exploration mechanism of SBOA,especially its Levy flight–based search strategy,can obviously improve optimization performance for low-and high-dimensional trusses.The research has implications in the context of promoting bio-inspired optimization techniques by demonstrating the viability of SBOA,a reliable model for large-scale structural design that provides significant enhancements in performance and convergence behavior.展开更多
Data serves as the foundation for training and testing machine learning and artificial intelligencemodels.The most fundamental part of data is its attributes or features.The feature set size changes from one dataset t...Data serves as the foundation for training and testing machine learning and artificial intelligencemodels.The most fundamental part of data is its attributes or features.The feature set size changes from one dataset to another.Only the relevant features contributemeaningfully to classificationaccuracy.The presence of irrelevant features reduces the system’s effectiveness.Classification performance often deteriorates on high-dimensional datasets due to the large search space.Thus,one of the significant obstacles affecting the performance of the learning process in the majority of machine learning and data mining techniques is the dimensionality of the datasets.Feature selection(FS)is an effective preprocessing step in classification tasks.The aim of applying FS is to exclude redundant and unrelated features while retaining the most informative ones to optimize classification capability and compress computational complexity.In this paper,a novel hybrid binary metaheuristic algorithm,termed hSC-FPA,is proposed by hybridizing the Flower Pollination Algorithm(FPA)and the Sine Cosine Algorithm(SCA).Hybridization controls the exploration capacity of SCA and the exploitation behavior of FPA to maintain a balanced search process.SCA guides the global search in the early iterations,while FPA’s local pollination refines promising solutions in later stages.A binary conversion mechanism using a threshold function is implemented to handle the discrete nature of the feature selection problem.The functionality of the proposed hSC-FPA is authenticated on fourteen standard datasets from the UCI repository using the K-Nearest Neighbors(K-NN)classifier.Experimental results are benchmarked against the standalone SCA and FPA algorithms.The hSC-FPA consistently achieves higher classification accuracy,selects a more compact feature subset,and demonstrates superior convergence behavior.These findings support the stability and outperformance of the hybrid feature selection method presented.展开更多
Traditional sampling-based path planning algorithms,such as the rapidly-exploring random tree star(RRT^(*)),encounter critical limitations in unstructured orchard environments,including low sampling efficiency in narr...Traditional sampling-based path planning algorithms,such as the rapidly-exploring random tree star(RRT^(*)),encounter critical limitations in unstructured orchard environments,including low sampling efficiency in narrow passages,slow convergence,and high computational costs.To address these challenges,this paper proposes a novel hybrid global path planning algorithm integrating Gaussian sampling and quadtree optimization(RRT^(*)-GSQ).This methodology aims to enhance path planning by synergistically combining a Gaussian mixture sampling strategy to improve node generation in critical regions,an adaptive step-size and direction optimization mechanism for enhanced obstacle avoidance,a Quadtree-AABB collision detection framework to lower computational complexity,and a dynamic iteration control strategy for more efficient convergence.In obstacle-free and obstructed scenarios,compared with the conventional RRT^(*),the proposed algorithm reduced the number of node evaluations by 67.57%and 62.72%,and decreased the search time by 79.72%and 78.52%,respectively.In path tracking tests,the proposed algorithm achieved substantial reductions in RMSE of the final path compared to the conventional RRT^(*).Specifically,the lateral RMSE was reduced by 41.5%in obstacle-free environments and 59.3%in obstructed environments,while the longitudinal RMSE was reduced by 57.2%and 58.5%,respectively.Furthermore,the maximum absolute errors in both lateral and longitudinal directions were constrained within 0.75 m.Field validation experiments in an operational orchard confirmed the algorithm's practical effectiveness,showing reductions in the mean tracking error of 47.6%(obstacle-free)and 58.3%(with obstructed),alongside a 5.1%and 7.2%shortening of the path length compared to the baseline method.The proposed algorithm effectively enhances path planning efficiency and navigation accuracy for robots,presenting a superior solution for high-precision autonomous navigation of agricultural robots in orchard environments and holding significant value for engineering applications.展开更多
We study the split common solution problem with multiple output sets for monotone operator equations in Hilbert spaces.To solve this problem,we propose two new parallel algorithms.We establish a weak convergence theor...We study the split common solution problem with multiple output sets for monotone operator equations in Hilbert spaces.To solve this problem,we propose two new parallel algorithms.We establish a weak convergence theorem for the first and a strong convergence theorem for the second.展开更多
Metaheuristic optimization algorithms continue to be essential for solving complex real-world problems,yet existingmethods often struggle with balancing exploration and exploitation across diverse problem landscapes.T...Metaheuristic optimization algorithms continue to be essential for solving complex real-world problems,yet existingmethods often struggle with balancing exploration and exploitation across diverse problem landscapes.This paper proposes a novel nature-inspired metaheuristic optimization algorithm named the Painted Wolf Optimization(PWO)algorithm.The main inspiration for the PWO algorithm is the group behavior and hunting strategy of painted wolves,also known as African wild dogs in the wild,particularly their unique consensus-based voting rally mechanism,a behavior fundamentally distinct fromthe social dynamics of grey wolves.In this innovative process,pack members explore different areas to find prey;then,they hold a pre-hunting voting rally based on the alpha member to determine who will begin the hunt and attack the prey.The efficiency of the proposed PWO algorithm is evaluated by a comparison study with other well-known optimization algorithms on 33 test functions,including the Congress on Evolutionary Computation(CEC)2017 suite and different real-world engineering design cases.Furthermore,the algorithm’s performance is further tested across a spectrum of optimization problems with extensive unknown search spaces.This includes its application within the field of cybersecurity,specifically in the context of training a machine learning-based intrusion detection system(ML-IDS),achieving an accuracy of 0.90 and an F-measure of 0.9290.Statistical analyses using the Wilcoxon signed-rank test(all p<0.05)indicate that the PWO algorithm outperforms existing state-of-the-art algorithms,providing superior solutions in diverse and unpredictable optimization landscapes.This demonstrates its potential as a robust method for tackling complex optimization problems in various fields.The source code for thePWOalgorithmis publicly available at https://github.com/saeidsheikhi/Painted-Wolf-Optimization.展开更多
基金Supported by National Key R&D Program of China(Grant No.2022YFB2503203)National Natural Science Foundation of China(Grant No.U1964206).
文摘Decision-making of connected and automated vehicles(CAV)includes a sequence of driving maneuvers that improve safety and efficiency,characterized by complex scenarios,strong uncertainty,and high real-time requirements.Deep reinforcement learning(DRL)exhibits excellent capability of real-time decision-making and adaptability to complex scenarios,and generalization abilities.However,it is arduous to guarantee complete driving safety and efficiency under the constraints of training samples and costs.This paper proposes a Mixture of Expert method(MoE)based on Soft Actor-Critic(SAC),where the upper-level discriminator dynamically decides whether to activate the lower-level DRL expert or the heuristic expert based on the features of the input state.To further enhance the performance of the DRL expert,a buffer zone is introduced in the reward function,preemptively applying penalties before insecure situations occur.In order to minimize collision and off-road rates,the Intelligent Driver Model(IDM)and Minimizing Overall Braking Induced by Lane changes(MOBIL)strategy are designed by heuristic experts.Finally,tested in typical simulation scenarios,MOE shows a 13.75%improvement in driving efficiency compared with the traditional DRL method with continuous action space.It ensures high safety with zero collision and zero off-road rates while maintaining high adaptability.
基金supported by National Key Research and Development Program of China(NO.2018AAA0103003)National Natural Science Foundation of China(NO.61773378)+1 种基金Basic Research Program(NO.JCKY*******B029)Strategic Priority Research Program of Chinese Academy of Science(NO.XDB32050100).
文摘Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.
基金supported by the National Natural Science Foundation of China(No.52177085).
文摘Peer-to-peer(P2P)energy trading in active distribution networks(ADNs)plays a pivotal role in promoting the efficient consumption of renewable energy sources.However,it is challenging to effectively coordinate the power dispatch of ADNs and P2P energy trading while preserving the privacy of different physical interests.Hence,this paper proposes a soft actor-critic algorithm incorporating distributed trading control(SAC-DTC)to tackle the optimal power dispatch of ADNs and the P2P energy trading considering privacy preservation among prosumers.First,the soft actor-critic(SAC)algorithm is used to optimize the control strategy of device in ADNs to minimize the operation cost,and the primary environmental information of the ADN at this point is published to prosumers.Then,a distributed generalized fast dual ascent method is used to iterate the trading process of prosumers and maximize their revenues.Subsequently,the results of trading are encrypted based on the differential privacy technique and returned to the ADN.Finally,the social welfare value consisting of ADN operation cost and P2P market revenue is utilized as a reward value to update network parameters and control strategies of the deep reinforcement learning.Simulation results show that the proposed SAC-DTC algorithm reduces the ADN operation cost,boosts the P2P market revenue,maximizes the social welfare,and exhibits high computational accuracy,demonstrating its practical application to the operation of power systems and power markets.
文摘Wireless Sensor Networks(WSNs)play a crucial role in numerous Internet of Things(IoT)applications and next-generation communication systems,yet they continue to face challenges in balancing energy efficiency and reliable connectivity.This study proposes SAC-HTC(Soft Actor-Critic-based High-performance Topology Control),a deep reinforcement learning(DRL)method based on the Actor-Critic framework,implemented within a Software Defined Wireless Sensor Network(SDWSN)architecture.In this approach,sensor nodes periodically transmit state information,including coordinates,node degree,transmission power,and neighbor lists,to a centralized controller.The controller acts as the reinforcement learning(RL)agent,with the Actor generating decisions to adjust transmission ranges,while the Critic evaluates action values to reflect the overall network performance.The bidirectional Node-Controller feedback mechanism enables the controller to issue appropriate control commands to each node,ensuring the maintenance of the desired node degree,reducing energy consumption,and preserving network connectivity.The algorithmfurther incorporates soft entropy adjustment to balance exploration and exploitation,alongwith an off-policy mechanism for efficient data reuse,making it well-suited to the resource-constrained conditions ofWSNs.Simulation results demonstrate that SAC-HTC not only outperforms traditional methods and several existing RL algorithms but also achieves faster convergence,optimized communication range control,global connectivity maintenance,and extended network lifetime.The key novelty of this research lies in the integration of the SAC method with the SDWSN architecture forWSNs topology control,providing an adaptive,efficient,and highly promisingmechanism for large-scale,dynamic,and high-performance sensor networks.
基金supported by the National Natural Science Foundation of China(62373364,62176259)the Key Research and Development Program of Jiangsu Province(BE2022095)。
文摘In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.
基金support of the National Key Research and Development Plan(No.2021YFB3302501)the financial support of the National Science Foundation of China(No.12161076)the financial support of the Fundamental Research Funds for the Central Universities(No.DUT25GF207).
文摘With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.
文摘Building integrated energy systems(BIESs)are pivotal for enhancing energy efficiency by accounting for a significant proportion of global energy consumption.Two key barriers that reduce the BIES operational efficiency mainly lie in the renewable generation uncertainty and operational non-convexity of combined heat and power(CHP)units.To this end,this paper proposes a soft actor-critic(SAC)algorithm to solve the scheduling problem of BIES,which overcomes the model non-convexity and shows advantages in robustness and generalization.This paper also adopts a temporal fusion transformer(TFT)to enhance the optimal solution for the SAC algorithm by forecasting the renewable generation and energy demand.The TFT can effectively capture the complex temporal patterns and dependencies that span multiple steps.Furthermore,its forecasting results are interpretable due to the employment of a self-attention layer so as to assist in more trustworthy decision-making in the SAC algorithm.The proposed hybrid data-driven approach integrating TFT and SAC algorithm,i.e.,TFT-SAC approach,is trained and tested on a real-world dataset to validate its superior performance in reducing the energy cost and computational time compared with the benchmark approaches.The generalization performance for the scheduling policy,as well as the sensitivity analysis,are examined in the case studies.
基金supported by the P.G.Senapathy Center for Computing Resources at IIT Madrasfunding provided by the Ministry of Education,Government of Indiasupported by the National Natural Science Foundation of China(Grant Nos.12388101,12472224 and 92252104).
文摘This paper presents an Eulerian-Lagrangian algorithm for direct numerical simulation(DNS)of particle-laden flows.The algorithm is applicable to perform simulations of dilute suspensions of small inertial particles in turbulent carrier flow.The Eulerian framework numerically resolves turbulent carrier flow using a parallelized,finite-volume DNS solver on a staggered Cartesian grid.Particles are tracked using a point-particle method utilizing a Lagrangian particle tracking(LPT)algorithm.The proposed Eulerian-Lagrangian algorithm is validated using an inertial particle-laden turbulent channel flow for different Stokes number cases.The particle concentration profiles and higher-order statistics of the carrier and dispersed phases agree well with the benchmark results.We investigated the effect of fluid velocity interpolation and numerical integration schemes of particle tracking algorithms on particle dispersion statistics.The suitability of fluid velocity interpolation schemes for predicting the particle dispersion statistics is discussed in the framework of the particle tracking algorithm coupled to the finite-volume solver.In addition,we present parallelization strategies implemented in the algorithm and evaluate their parallel performance.
文摘Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion algorithm took advantage of the fast optimization ability of PSO to optimize the population screening link of GA.The Simulink simulation results showed that the convergence of the fitness function of the fusion algorithm was accelerated,the system response adjustment time was reduced,and the overshoot was almost zero.Then the algorithm was applied to the steering test of agricultural robot in various scenes.After modeling the steering system of agricultural robot,the steering test results in the unloaded suspended state showed that the PID control based on fusion algorithm reduced the rise time,response adjustment time and overshoot of the system,and improved the response speed and stability of the system,compared with the artificial trial and error PID control and the PID control based on GA.The actual road steering test results showed that the PID control response rise time based on the fusion algorithm was the shortest,about 4.43 s.When the target pulse number was set to 100,the actual mean value in the steady-state regulation stage was about 102.9,which was the closest to the target value among the three control methods,and the overshoot was reduced at the same time.The steering test results under various scene states showed that the PID control based on the proposed fusion algorithm had good anti-interference ability,it can adapt to the changes of environment and load and improve the performance of the control system.It was effective in the steering control of agricultural robot.This method can provide a reference for the precise steering control of other robots.
基金Supported by Guizhou Provincial Basic Research Program (Natural Science)(No.ZK[2022]020)。
文摘The Steiner k-eccentricity of a vertex is the maximum Steiner distance over all k-sets each of which contains the given vertex,where the Steiner distance of a vertex set is the size of a minimum Steiner tree on this set.Since the minimum Steiner tree problem is well-known NP-hard,the Steiner k-eccentricity is not so easy to compute.This paper attempts to efficiently solve this problem on block graphs and general graphs with limited cycles.A block graph is a graph in which each block is a clique,and is also called a clique-tree.On block graphs,we propose an O(k(n+m))-time algorithm to compute the Steiner k-eccentricity of a vertex where n and m are respectively the order and size of a block graph.On general graphs with limited cycles,we take the cyclomatic numberν(G)as a parameter which is the minimum number of edges of G whose removal makes G acyclic,and devise an O(n^(ν(G)+1)(n(G)+m(G)+k))-time algorithm.
文摘Optimization is the key to obtaining efficient utilization of resources in structural design.Due to the complex nature of truss systems,this study presents a method based on metaheuristic modelling that minimises structural weight under stress and frequency constraints.Two new algorithms,the Red Kite Optimization Algorithm(ROA)and Secretary Bird Optimization Algorithm(SBOA),are utilized on five benchmark trusses with 10,18,37,72,and 200-bar trusses.Both algorithms are evaluated against benchmarks in the literature.The results indicate that SBOA always reaches a lighter optimal.Designs with reducing structural weight ranging from 0.02%to 0.15%compared to ROA,and up to 6%–8%as compared to conventional algorithms.In addition,SBOA can achieve 15%–20%faster convergence speed and 10%–18%reduction in computational time with a smaller standard deviation over independent runs,which demonstrates its robustness and reliability.It is indicated that the adaptive exploration mechanism of SBOA,especially its Levy flight–based search strategy,can obviously improve optimization performance for low-and high-dimensional trusses.The research has implications in the context of promoting bio-inspired optimization techniques by demonstrating the viability of SBOA,a reliable model for large-scale structural design that provides significant enhancements in performance and convergence behavior.
基金supported by a research grant from Lahore College for Women University(LCWU),Lahore,Pakistan.
文摘Data serves as the foundation for training and testing machine learning and artificial intelligencemodels.The most fundamental part of data is its attributes or features.The feature set size changes from one dataset to another.Only the relevant features contributemeaningfully to classificationaccuracy.The presence of irrelevant features reduces the system’s effectiveness.Classification performance often deteriorates on high-dimensional datasets due to the large search space.Thus,one of the significant obstacles affecting the performance of the learning process in the majority of machine learning and data mining techniques is the dimensionality of the datasets.Feature selection(FS)is an effective preprocessing step in classification tasks.The aim of applying FS is to exclude redundant and unrelated features while retaining the most informative ones to optimize classification capability and compress computational complexity.In this paper,a novel hybrid binary metaheuristic algorithm,termed hSC-FPA,is proposed by hybridizing the Flower Pollination Algorithm(FPA)and the Sine Cosine Algorithm(SCA).Hybridization controls the exploration capacity of SCA and the exploitation behavior of FPA to maintain a balanced search process.SCA guides the global search in the early iterations,while FPA’s local pollination refines promising solutions in later stages.A binary conversion mechanism using a threshold function is implemented to handle the discrete nature of the feature selection problem.The functionality of the proposed hSC-FPA is authenticated on fourteen standard datasets from the UCI repository using the K-Nearest Neighbors(K-NN)classifier.Experimental results are benchmarked against the standalone SCA and FPA algorithms.The hSC-FPA consistently achieves higher classification accuracy,selects a more compact feature subset,and demonstrates superior convergence behavior.These findings support the stability and outperformance of the hybrid feature selection method presented.
基金National Natural Science Foundation of China(32301712)Natural Science Foundation of Jiangsu Province(BK20230548,BK20250876)+2 种基金Project of Faculty of Agricultural Equipment of Jiangsu University(NGXB20240203)A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD-2023-87)Open Funding Project of the Key Laboratory of Modern Agricultural Equipment and Technology(Jiangsu University),Ministry of Education(MAET202101)。
文摘Traditional sampling-based path planning algorithms,such as the rapidly-exploring random tree star(RRT^(*)),encounter critical limitations in unstructured orchard environments,including low sampling efficiency in narrow passages,slow convergence,and high computational costs.To address these challenges,this paper proposes a novel hybrid global path planning algorithm integrating Gaussian sampling and quadtree optimization(RRT^(*)-GSQ).This methodology aims to enhance path planning by synergistically combining a Gaussian mixture sampling strategy to improve node generation in critical regions,an adaptive step-size and direction optimization mechanism for enhanced obstacle avoidance,a Quadtree-AABB collision detection framework to lower computational complexity,and a dynamic iteration control strategy for more efficient convergence.In obstacle-free and obstructed scenarios,compared with the conventional RRT^(*),the proposed algorithm reduced the number of node evaluations by 67.57%and 62.72%,and decreased the search time by 79.72%and 78.52%,respectively.In path tracking tests,the proposed algorithm achieved substantial reductions in RMSE of the final path compared to the conventional RRT^(*).Specifically,the lateral RMSE was reduced by 41.5%in obstacle-free environments and 59.3%in obstructed environments,while the longitudinal RMSE was reduced by 57.2%and 58.5%,respectively.Furthermore,the maximum absolute errors in both lateral and longitudinal directions were constrained within 0.75 m.Field validation experiments in an operational orchard confirmed the algorithm's practical effectiveness,showing reductions in the mean tracking error of 47.6%(obstacle-free)and 58.3%(with obstructed),alongside a 5.1%and 7.2%shortening of the path length compared to the baseline method.The proposed algorithm effectively enhances path planning efficiency and navigation accuracy for robots,presenting a superior solution for high-precision autonomous navigation of agricultural robots in orchard environments and holding significant value for engineering applications.
基金supported by the Science and Technology Fund of TNU-Thai Nguyen University of Science.
文摘We study the split common solution problem with multiple output sets for monotone operator equations in Hilbert spaces.To solve this problem,we propose two new parallel algorithms.We establish a weak convergence theorem for the first and a strong convergence theorem for the second.
文摘Metaheuristic optimization algorithms continue to be essential for solving complex real-world problems,yet existingmethods often struggle with balancing exploration and exploitation across diverse problem landscapes.This paper proposes a novel nature-inspired metaheuristic optimization algorithm named the Painted Wolf Optimization(PWO)algorithm.The main inspiration for the PWO algorithm is the group behavior and hunting strategy of painted wolves,also known as African wild dogs in the wild,particularly their unique consensus-based voting rally mechanism,a behavior fundamentally distinct fromthe social dynamics of grey wolves.In this innovative process,pack members explore different areas to find prey;then,they hold a pre-hunting voting rally based on the alpha member to determine who will begin the hunt and attack the prey.The efficiency of the proposed PWO algorithm is evaluated by a comparison study with other well-known optimization algorithms on 33 test functions,including the Congress on Evolutionary Computation(CEC)2017 suite and different real-world engineering design cases.Furthermore,the algorithm’s performance is further tested across a spectrum of optimization problems with extensive unknown search spaces.This includes its application within the field of cybersecurity,specifically in the context of training a machine learning-based intrusion detection system(ML-IDS),achieving an accuracy of 0.90 and an F-measure of 0.9290.Statistical analyses using the Wilcoxon signed-rank test(all p<0.05)indicate that the PWO algorithm outperforms existing state-of-the-art algorithms,providing superior solutions in diverse and unpredictable optimization landscapes.This demonstrates its potential as a robust method for tackling complex optimization problems in various fields.The source code for thePWOalgorithmis publicly available at https://github.com/saeidsheikhi/Painted-Wolf-Optimization.