Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev...Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.展开更多
In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target fo...In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.展开更多
Frequency hopping(FH)communication has good anti-fading,anti-jamming and anti-eavesdropping capabilities,so it is one of the main ways to combat electronic jamming.In order to further improve the anti-jamming capabili...Frequency hopping(FH)communication has good anti-fading,anti-jamming and anti-eavesdropping capabilities,so it is one of the main ways to combat electronic jamming.In order to further improve the anti-jamming capability of FH communication,the parameters such as fixed frequency interval,hopping rate and hopping frequency in conventional FH can be assigned with time-varying characteristics.In order to set appropriate hopping parameters to improve the performance of the system in the electromagnetic environment with various types of jamming,a heuristically accelerated Q-learning(HAQL)method is proposed in this paper.Firstly,a theoretical model for the parameter decision-making of FH system is made,and the key parameters affecting the energy efficiency of the system are analyzed.Secondly,a Q-learning model in complex electromagnetic environment is proposed,which includes setting states,actions and rewards,as well as a HAQL-based decisionmaking algorithm is put forward.Lastly,simulations are carried out under different jamming environments,and simulation results show that the average energy efficiency of HAQL algorithm is higher than that of the SARSA algorithm,the e-greedy QL algorithm and the HQL-OSGM algorithm,respectively.展开更多
基金the National Natural Science Foundation of China(62076225,62073300)the Natural Science Foundation for Distinguished Young Scholars of Hubei(2019CFA081)。
文摘Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.
基金supported by the National Natural Science Foundation of China(62373364,62176259)the Key Research and Development Program of Jiangsu Province(BE2022095)。
文摘In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.
基金State Key Program of National Natural Science of China under grant nos.U19B2016。
文摘Frequency hopping(FH)communication has good anti-fading,anti-jamming and anti-eavesdropping capabilities,so it is one of the main ways to combat electronic jamming.In order to further improve the anti-jamming capability of FH communication,the parameters such as fixed frequency interval,hopping rate and hopping frequency in conventional FH can be assigned with time-varying characteristics.In order to set appropriate hopping parameters to improve the performance of the system in the electromagnetic environment with various types of jamming,a heuristically accelerated Q-learning(HAQL)method is proposed in this paper.Firstly,a theoretical model for the parameter decision-making of FH system is made,and the key parameters affecting the energy efficiency of the system are analyzed.Secondly,a Q-learning model in complex electromagnetic environment is proposed,which includes setting states,actions and rewards,as well as a HAQL-based decisionmaking algorithm is put forward.Lastly,simulations are carried out under different jamming environments,and simulation results show that the average energy efficiency of HAQL algorithm is higher than that of the SARSA algorithm,the e-greedy QL algorithm and the HQL-OSGM algorithm,respectively.