How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop tradi...How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop trading strategies by automatically extracting complex features from a large amount of data,is struggling to deal with fastchanging markets due to sample inefficiency.This paper applies the meta-reinforcement learning method to tackle the trading challenges faced by conventional reinforcement learning(RL)approaches in non-stationary markets for the first time.In our work,the history trading data is divided into multiple task data and for each of these data themarket condition is relatively stationary.Then amodel agnosticmeta-learning(MAML)-based tradingmethod involving a meta-learner and a normal learner is proposed.A trading policy is learned by the meta-learner across multiple task data,which is then fine-tuned by the normal learner through a small amount of data from a new market task before trading in it.To improve the adaptability of the MAML-based method,an ordered multiplestep updating mechanism is also proposed to explore the changing dynamic within a task market.The simulation results demonstrate that the proposed MAML-based trading methods can increase the annualized return rate by approximately 180%,200%,and 160%,increase the Sharpe ratio by 180%,90%,and 170%,and decrease the maximum drawdown by 30%,20%,and 40%,compared to the traditional RL approach in three stock index future markets,respectively.展开更多
The increasing penetration of renewable energy resources and reduced system inertia pose risks to frequency security of power systems,necessitating the development of fast frequency regulation(FFR)methods using flexib...The increasing penetration of renewable energy resources and reduced system inertia pose risks to frequency security of power systems,necessitating the development of fast frequency regulation(FFR)methods using flexible resources.However,developing effective FFR policies is challenging because different power system operating conditions require distinct regulation logics.Traditional fixed-coefficient linear droop-based control methods are suboptimal for managing the diverse conditions encountered.This paper proposes a dynamic nonlinear P-f droop-based FFR method using a newly established meta-reinforcement learning(meta-RL)approach to enhance control adaptability while ensuring grid stability.First,we model the optimal FFR problem under various operating conditions as a set of Markov decision processes and accordingly formulate the frequency stability-constrained meta-RL problem.To address this,we then construct a novel hierarchical neural network(HNN)structure that incorporates a theoretical frequency stability guarantee,thereby converting the constrained meta-RL problem into a more tractable form.Finally,we propose a two-stage algorithm that leverages the inherent characteristics of the problem,achieving enhanced optimality in solving the HNN-based meta-RL problem.Simulations validate that the proposed FFR method shows superior adaptability across different operating conditions,and achieves better trade-offs between regulation performance and cost than benchmarks.展开更多
At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars fo...At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars for high efficiency,high precision,and high automation.Therefore,it is necessary to explore a new intelligent radar control learning framework and technology to improve the capability and automation of radar detection.Reinforcement learning is popular in decision task learning,but the shortage of samples in radar control tasks makes it difficult to meet the requirements of reinforcement learning.To address the above issues,we propose a practical radar operation reinforcement learning framework,and integrate offline reinforcement learning and meta-reinforcement learning methods to alleviate the sample requirements of reinforcement learning.Experimental results show that our method can automatically perform as humans in radar detection with real-world settings,thereby promoting the practical application of reinforcement learning in radar operation.展开更多
Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world applications.Notably,deep neural networks play a crucial role in unlocking RL’s potential i...Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world applications.Notably,deep neural networks play a crucial role in unlocking RL’s potential in large-scale decision-making tasks.Inspired by current major success of Transformer in natural language processing and computer vision,numerous bottlenecks have been overcome by combining Transformer with RL for decision-making.This paper presents a multiangle systematic survey of various Transformer-based RL(TransRL)models applied in decision-making tasks,including basic models,advanced algorithms,representative implementation instances,typical applications,and known challenges.Our work aims to provide insights into problems that inherently arise with the current RL approaches,and examines how we can address them with better TransRL models.To our knowledge,we are the first to present a comprehensive review of the recent Transformer research developments in RL for decision-making.We hope that this survey provides a comprehensive review of TransRL models and inspires the RL community in its pursuit of future directions.To keep track of the rapid TransRL developments in the decision-making domains,we summarize the latest papers and their open-source implementations at https://github.com/williamyuanv0/Transformer-in-Reinforcement-Learning-for-Decision-Making-A-Survey.展开更多
文摘How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop trading strategies by automatically extracting complex features from a large amount of data,is struggling to deal with fastchanging markets due to sample inefficiency.This paper applies the meta-reinforcement learning method to tackle the trading challenges faced by conventional reinforcement learning(RL)approaches in non-stationary markets for the first time.In our work,the history trading data is divided into multiple task data and for each of these data themarket condition is relatively stationary.Then amodel agnosticmeta-learning(MAML)-based tradingmethod involving a meta-learner and a normal learner is proposed.A trading policy is learned by the meta-learner across multiple task data,which is then fine-tuned by the normal learner through a small amount of data from a new market task before trading in it.To improve the adaptability of the MAML-based method,an ordered multiplestep updating mechanism is also proposed to explore the changing dynamic within a task market.The simulation results demonstrate that the proposed MAML-based trading methods can increase the annualized return rate by approximately 180%,200%,and 160%,increase the Sharpe ratio by 180%,90%,and 170%,and decrease the maximum drawdown by 30%,20%,and 40%,compared to the traditional RL approach in three stock index future markets,respectively.
基金supported by the Key Research and Development Program of Inner Mongolia,China(No.2021ZD0039).
文摘The increasing penetration of renewable energy resources and reduced system inertia pose risks to frequency security of power systems,necessitating the development of fast frequency regulation(FFR)methods using flexible resources.However,developing effective FFR policies is challenging because different power system operating conditions require distinct regulation logics.Traditional fixed-coefficient linear droop-based control methods are suboptimal for managing the diverse conditions encountered.This paper proposes a dynamic nonlinear P-f droop-based FFR method using a newly established meta-reinforcement learning(meta-RL)approach to enhance control adaptability while ensuring grid stability.First,we model the optimal FFR problem under various operating conditions as a set of Markov decision processes and accordingly formulate the frequency stability-constrained meta-RL problem.To address this,we then construct a novel hierarchical neural network(HNN)structure that incorporates a theoretical frequency stability guarantee,thereby converting the constrained meta-RL problem into a more tractable form.Finally,we propose a two-stage algorithm that leverages the inherent characteristics of the problem,achieving enhanced optimality in solving the HNN-based meta-RL problem.Simulations validate that the proposed FFR method shows superior adaptability across different operating conditions,and achieves better trade-offs between regulation performance and cost than benchmarks.
基金supported by Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project under Grant No.2021ZD0113303the National Natural Science Foundation of China under Grant Nos.62192783 and 62276128,and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘At present,the parameters of radar detection rely heavily on manual adjustment and empirical knowledge,resulting in low automation.Traditional manual adjustment methods cannot meet the requirements of modern radars for high efficiency,high precision,and high automation.Therefore,it is necessary to explore a new intelligent radar control learning framework and technology to improve the capability and automation of radar detection.Reinforcement learning is popular in decision task learning,but the shortage of samples in radar control tasks makes it difficult to meet the requirements of reinforcement learning.To address the above issues,we propose a practical radar operation reinforcement learning framework,and integrate offline reinforcement learning and meta-reinforcement learning methods to alleviate the sample requirements of reinforcement learning.Experimental results show that our method can automatically perform as humans in radar detection with real-world settings,thereby promoting the practical application of reinforcement learning in radar operation.
基金Project supported by the National Natural Science Foundation of China(No.62376280)。
文摘Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world applications.Notably,deep neural networks play a crucial role in unlocking RL’s potential in large-scale decision-making tasks.Inspired by current major success of Transformer in natural language processing and computer vision,numerous bottlenecks have been overcome by combining Transformer with RL for decision-making.This paper presents a multiangle systematic survey of various Transformer-based RL(TransRL)models applied in decision-making tasks,including basic models,advanced algorithms,representative implementation instances,typical applications,and known challenges.Our work aims to provide insights into problems that inherently arise with the current RL approaches,and examines how we can address them with better TransRL models.To our knowledge,we are the first to present a comprehensive review of the recent Transformer research developments in RL for decision-making.We hope that this survey provides a comprehensive review of TransRL models and inspires the RL community in its pursuit of future directions.To keep track of the rapid TransRL developments in the decision-making domains,we summarize the latest papers and their open-source implementations at https://github.com/williamyuanv0/Transformer-in-Reinforcement-Learning-for-Decision-Making-A-Survey.