Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,s...Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,so it can handle optimal control problems effectively.This paper proposes an inverse reinforcement learning optimal control method for Takagi-Sugeno(T-S)fuzzy systems.Based on learner systems,an expert system is constructed,where the learner system only knows the expert system's optimal control policy.To reconstruct the unknown cost function,we firstly develop a model-based inverse reinforcement learning algorithm for the case that systems dynamics are known.The developed model-based learning algorithm is consists of two learning stages:an inner reinforcement learning loop and an outer inverse optimal control loop.The inner loop desires to obtain optimal control policy via learner's cost function and the outer loop aims to update learner's state-penalty matrices via only using expert's optimal control policy.Then,to eliminate the requirement that the system dynamics must be known,a data-driven integral learning algorithm is presented.It is proved that the presented two algorithms are convergent and the developed inverse reinforcement learning optimal control scheme can ensure the controlled fuzzy learner systems to be asymptotically stable.Finally,we apply the proposed fuzzy optimal control to the truck-trailer system,and the computer simulation results verify the effectiveness of the presented approach.展开更多
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep...In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures.展开更多
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi...This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.展开更多
The prevalence of on-street parking search in urban downtown areas has led to significant externalities such as congestion,pollution,and collisions.Understanding the intricacies of parking search behavior is crucial f...The prevalence of on-street parking search in urban downtown areas has led to significant externalities such as congestion,pollution,and collisions.Understanding the intricacies of parking search behavior is crucial for developing effective management strategies to mit-igate these issues.Parking search is inherently a complex,sequential decision-making pro-cess,influenced by diverse driver preferences and dynamic urban environments.This study introduces a deep inverse reinforcement learning(DIRL)approach to model drivers’park-ing search behavior.First,we constructed a high-fidelity parking simulation platform using Unity3D to replicate an urban road network,enabling the collection of 987 valid trajecto-ries.We modeled the parking search process as a Markov decision process(MDP),with meticulously designed state-action pairs for accurate representation.Then,a maximum entropy-based DIRL model was developed to learn the reward function and search-for-parking policies of drivers.The experimental results demonstrate that the maximum entropy DIRL model significantly outperforms the traditional maximum entropy inverse reinforcement learning model,achieving a 19.0%improvement in accurately capturing final parking states and a 13.5%enhancement in characterizing overall trajectory distribu-tions.Finally,we integrated these trained models into traditional traffic simulation sys-tems to effectively observe the traffic state evolution with different parking search behaviors,providing valuable insights for optimizing urban traffic management strategies.展开更多
In order to address the problems of Coyote Optimization Algorithm in image thresholding,such as easily falling into local optimum,and slow convergence speed,a Fuzzy Hybrid Coyote Optimization Algorithm(here-inafter re...In order to address the problems of Coyote Optimization Algorithm in image thresholding,such as easily falling into local optimum,and slow convergence speed,a Fuzzy Hybrid Coyote Optimization Algorithm(here-inafter referred to as FHCOA)based on chaotic initialization and reverse learning strategy is proposed,and its effect on image thresholding is verified.Through chaotic initialization,the random number initialization mode in the standard coyote optimization algorithm(COA)is replaced by chaotic sequence.Such sequence is nonlinear and long-term unpredictable,these characteristics can effectively improve the diversity of the population in the optimization algorithm.Therefore,in this paper we first perform chaotic initialization,using chaotic sequence to replace random number initialization in standard COA.By combining the lens imaging reverse learning strategy and the optimal worst reverse learning strategy,a hybrid reverse learning strategy is then formed.In the process of algorithm traversal,the best coyote and the worst coyote in the pack are selected for reverse learning operation respectively,which prevents the algorithm falling into local optimum to a certain extent and also solves the problem of premature convergence.Based on the above improvements,the coyote optimization algorithm has better global convergence and computational robustness.The simulation results show that the algorithmhas better thresholding effect than the five commonly used optimization algorithms in image thresholding when multiple images are selected and different threshold numbers are set.展开更多
Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provi...Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under uncertainty.RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment.However,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided beforehand.In practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert demonstrations.In this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and compared.Findings-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of IRL.Originality/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far.展开更多
We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unkno...We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unknown or numerous.The importance rating of each abstract feature is incorporated into the reward function.Simulation is performed on a task of driving in a five-lane highway,where the controlled car has the largest fixed speed among all the cars.Performance is almost 10.6% better on average with than without importance ratings.展开更多
Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and...Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.展开更多
Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to ca...Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.展开更多
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential ...Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision problem.An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable.Current mainstream methods apply heuristic reward functions to counter this challenge.However,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities.To this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process.Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.展开更多
基金The National Natural Science Foundation of China(62173172).
文摘Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,so it can handle optimal control problems effectively.This paper proposes an inverse reinforcement learning optimal control method for Takagi-Sugeno(T-S)fuzzy systems.Based on learner systems,an expert system is constructed,where the learner system only knows the expert system's optimal control policy.To reconstruct the unknown cost function,we firstly develop a model-based inverse reinforcement learning algorithm for the case that systems dynamics are known.The developed model-based learning algorithm is consists of two learning stages:an inner reinforcement learning loop and an outer inverse optimal control loop.The inner loop desires to obtain optimal control policy via learner's cost function and the outer loop aims to update learner's state-penalty matrices via only using expert's optimal control policy.Then,to eliminate the requirement that the system dynamics must be known,a data-driven integral learning algorithm is presented.It is proved that the presented two algorithms are convergent and the developed inverse reinforcement learning optimal control scheme can ensure the controlled fuzzy learner systems to be asymptotically stable.Finally,we apply the proposed fuzzy optimal control to the truck-trailer system,and the computer simulation results verify the effectiveness of the presented approach.
文摘In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures.
文摘This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.
基金supported by the National Natural Science Foundation of China(No.52102383)in part by the China Post-doctoral Science Foundation(Nos.2021M692428 and 2023T160487)part by the Young Elite Scientist Sponsorship Program by the China Association for Science and Technology(No.YESS20220215).
文摘The prevalence of on-street parking search in urban downtown areas has led to significant externalities such as congestion,pollution,and collisions.Understanding the intricacies of parking search behavior is crucial for developing effective management strategies to mit-igate these issues.Parking search is inherently a complex,sequential decision-making pro-cess,influenced by diverse driver preferences and dynamic urban environments.This study introduces a deep inverse reinforcement learning(DIRL)approach to model drivers’park-ing search behavior.First,we constructed a high-fidelity parking simulation platform using Unity3D to replicate an urban road network,enabling the collection of 987 valid trajecto-ries.We modeled the parking search process as a Markov decision process(MDP),with meticulously designed state-action pairs for accurate representation.Then,a maximum entropy-based DIRL model was developed to learn the reward function and search-for-parking policies of drivers.The experimental results demonstrate that the maximum entropy DIRL model significantly outperforms the traditional maximum entropy inverse reinforcement learning model,achieving a 19.0%improvement in accurately capturing final parking states and a 13.5%enhancement in characterizing overall trajectory distribu-tions.Finally,we integrated these trained models into traditional traffic simulation sys-tems to effectively observe the traffic state evolution with different parking search behaviors,providing valuable insights for optimizing urban traffic management strategies.
基金This paper is supported by the National Youth Natural Science Foundation of China(61802208)the National Natural Science Foundation of China(61572261 and 61876089)+3 种基金the Natural Science Foundation of Anhui(1908085MF207,KJ2020A1215,KJ2021A1251 and KJ2021A1253)the Excellent Youth Talent Support Foundation of Anhui(gxyqZD2019097 and gxyqZD2021142)the Postdoctoral Foundation of Jiangsu(2018K009B)the Foundation of Fuyang Normal University(TDJC2021008).
文摘In order to address the problems of Coyote Optimization Algorithm in image thresholding,such as easily falling into local optimum,and slow convergence speed,a Fuzzy Hybrid Coyote Optimization Algorithm(here-inafter referred to as FHCOA)based on chaotic initialization and reverse learning strategy is proposed,and its effect on image thresholding is verified.Through chaotic initialization,the random number initialization mode in the standard coyote optimization algorithm(COA)is replaced by chaotic sequence.Such sequence is nonlinear and long-term unpredictable,these characteristics can effectively improve the diversity of the population in the optimization algorithm.Therefore,in this paper we first perform chaotic initialization,using chaotic sequence to replace random number initialization in standard COA.By combining the lens imaging reverse learning strategy and the optimal worst reverse learning strategy,a hybrid reverse learning strategy is then formed.In the process of algorithm traversal,the best coyote and the worst coyote in the pack are selected for reverse learning operation respectively,which prevents the algorithm falling into local optimum to a certain extent and also solves the problem of premature convergence.Based on the above improvements,the coyote optimization algorithm has better global convergence and computational robustness.The simulation results show that the algorithmhas better thresholding effect than the five commonly used optimization algorithms in image thresholding when multiple images are selected and different threshold numbers are set.
文摘Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under uncertainty.RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment.However,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided beforehand.In practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert demonstrations.In this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and compared.Findings-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of IRL.Originality/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far.
文摘We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unknown or numerous.The importance rating of each abstract feature is incorporated into the reward function.Simulation is performed on a task of driving in a five-lane highway,where the controlled car has the largest fixed speed among all the cars.Performance is almost 10.6% better on average with than without importance ratings.
基金Project (No.90820306) supported by the National Natural Science Foundation of China
文摘Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
基金supported by the National Natural Science Foundation of China(Grant No.91848202)the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.51521003)。
文摘Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.
基金This work was supported by the National Natural Science Foundation of China(No.U19A2059)。
文摘Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision problem.An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable.Current mainstream methods apply heuristic reward functions to counter this challenge.However,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities.To this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process.Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.