The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's ...The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.展开更多
Adverse weather during aircraft operation generates more complex scenarios for tactical trajectory planning,which requires superior real-time performance and conflict-free reliability of solving methods.Multi-aircraft...Adverse weather during aircraft operation generates more complex scenarios for tactical trajectory planning,which requires superior real-time performance and conflict-free reliability of solving methods.Multi-aircraft real-time 4D trajectory planning under adverse weather is an essential problem in Air Traffic Control(ATC)and it is challenging for the existing methods to be applied effectively.A framework of Double Deep Q-value Network under the Critic guidance with heuristic Pairing(DDQNC-P)is proposed to solve this problem.An Agent for two aircraft synergetic trajectory planning is trained by the Deep Reinforcement Learning(DRL)model of DDQNC,which completes two aircraft 4D trajectory planning tasks preliminarily under dynamic weather conditions.Then a heuristic pairing algorithm is designed to convert the multi-aircraft synergetic trajectory planning into multi-time pairwise synergetic trajectory planning,making the multiaircraft trajectory planning problem processable for the trained Agent.This framework compresses the input dimensions of the DRL model while improving its generalization ability significantly.Substantial simulations with various aircraft numbers,weather conditions,and airspace structures were conducted for performance verification and comparison.The success rate of conflict-free trajectory resolution reached 96.56% with an average calculation time of 0.41 s for 3504D trajectory points per aircraft,finally confirming its applicability to make real-time decision-making support for controllers in real-world ATC systems.展开更多
There have been many skewed cancer gene expression datasets in the post-genomic era.Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms w...There have been many skewed cancer gene expression datasets in the post-genomic era.Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class,leading to inaccurate diagnosis in clinical trails.This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure.The extracted genes are paired as decision rules to distinguish both classes,with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples;thus avoiding tedious data normalization and classifier construction.The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.展开更多
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936)the National Natural Science Foundation of China(62320106008)
文摘The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
基金the support of the Chinese Special Research Project for Civil Aircraft(No.MJZ1-7N22)the National Natural Science Foundation of China(No.U2133207).
文摘Adverse weather during aircraft operation generates more complex scenarios for tactical trajectory planning,which requires superior real-time performance and conflict-free reliability of solving methods.Multi-aircraft real-time 4D trajectory planning under adverse weather is an essential problem in Air Traffic Control(ATC)and it is challenging for the existing methods to be applied effectively.A framework of Double Deep Q-value Network under the Critic guidance with heuristic Pairing(DDQNC-P)is proposed to solve this problem.An Agent for two aircraft synergetic trajectory planning is trained by the Deep Reinforcement Learning(DRL)model of DDQNC,which completes two aircraft 4D trajectory planning tasks preliminarily under dynamic weather conditions.Then a heuristic pairing algorithm is designed to convert the multi-aircraft synergetic trajectory planning into multi-time pairwise synergetic trajectory planning,making the multiaircraft trajectory planning problem processable for the trained Agent.This framework compresses the input dimensions of the DRL model while improving its generalization ability significantly.Substantial simulations with various aircraft numbers,weather conditions,and airspace structures were conducted for performance verification and comparison.The success rate of conflict-free trajectory resolution reached 96.56% with an average calculation time of 0.41 s for 3504D trajectory points per aircraft,finally confirming its applicability to make real-time decision-making support for controllers in real-world ATC systems.
基金Supported by the National Natural Science Foundation of China(No.61105057)the Ph.D Foundation of Jiangsu University of Science and Technology(Nos.35301002 and 35211104)
文摘There have been many skewed cancer gene expression datasets in the post-genomic era.Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class,leading to inaccurate diagnosis in clinical trails.This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure.The extracted genes are paired as decision rules to distinguish both classes,with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples;thus avoiding tedious data normalization and classifier construction.The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets.