期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Inverse Reinforcement Learning Optimal Control for Takagi-Sugeno Fuzzy Systems
1
作者 Wenting SONG Shaocheng TONG 《Artificial Intelligence Science and Engineering》 2025年第2期134-146,共13页
Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,s... Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,so it can handle optimal control problems effectively.This paper proposes an inverse reinforcement learning optimal control method for Takagi-Sugeno(T-S)fuzzy systems.Based on learner systems,an expert system is constructed,where the learner system only knows the expert system's optimal control policy.To reconstruct the unknown cost function,we firstly develop a model-based inverse reinforcement learning algorithm for the case that systems dynamics are known.The developed model-based learning algorithm is consists of two learning stages:an inner reinforcement learning loop and an outer inverse optimal control loop.The inner loop desires to obtain optimal control policy via learner's cost function and the outer loop aims to update learner's state-penalty matrices via only using expert's optimal control policy.Then,to eliminate the requirement that the system dynamics must be known,a data-driven integral learning algorithm is presented.It is proved that the presented two algorithms are convergent and the developed inverse reinforcement learning optimal control scheme can ensure the controlled fuzzy learner systems to be asymptotically stable.Finally,we apply the proposed fuzzy optimal control to the truck-trailer system,and the computer simulation results verify the effectiveness of the presented approach. 展开更多
关键词 Takagi-Sugeno fuzzy systems learnerexpert framework inverse reinforcement learning algorithm optimal control
在线阅读 下载PDF
Recognition and interfere deceptive behavior based on inverse reinforcement learning and game theory 被引量:2
2
作者 ZENG Yunxiu XU Kai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期270-288,共19页
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep... In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures. 展开更多
关键词 deceptive path planning inverse reinforcement learning(IRL) game theory goal recognition
在线阅读 下载PDF
Heterogeneous multi-player imitation learning
3
作者 Bosen Lian Wenqian Xue Frank L.Lewis 《Control Theory and Technology》 EI CSCD 2023年第3期281-291,共11页
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi... This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods. 展开更多
关键词 Imitation learning inverse reinforcement learning Heterogeneous multi-player games Data-driven model-free control
原文传递
Learning to search for parking like a human:a deep inverse reinforcement learning approach
4
作者 Shiyu Wang Haiyan Yang +3 位作者 Yijia Tang Jing Chen Cong Zhao Yuchuan Du 《International Journal of Transportation Science and Technology》 2025年第4期204-217,共14页
The prevalence of on-street parking search in urban downtown areas has led to significant externalities such as congestion,pollution,and collisions.Understanding the intricacies of parking search behavior is crucial f... The prevalence of on-street parking search in urban downtown areas has led to significant externalities such as congestion,pollution,and collisions.Understanding the intricacies of parking search behavior is crucial for developing effective management strategies to mit-igate these issues.Parking search is inherently a complex,sequential decision-making pro-cess,influenced by diverse driver preferences and dynamic urban environments.This study introduces a deep inverse reinforcement learning(DIRL)approach to model drivers’park-ing search behavior.First,we constructed a high-fidelity parking simulation platform using Unity3D to replicate an urban road network,enabling the collection of 987 valid trajecto-ries.We modeled the parking search process as a Markov decision process(MDP),with meticulously designed state-action pairs for accurate representation.Then,a maximum entropy-based DIRL model was developed to learn the reward function and search-for-parking policies of drivers.The experimental results demonstrate that the maximum entropy DIRL model significantly outperforms the traditional maximum entropy inverse reinforcement learning model,achieving a 19.0%improvement in accurately capturing final parking states and a 13.5%enhancement in characterizing overall trajectory distribu-tions.Finally,we integrated these trained models into traditional traffic simulation sys-tems to effectively observe the traffic state evolution with different parking search behaviors,providing valuable insights for optimizing urban traffic management strategies. 展开更多
关键词 Search-for-parking Behavior modeling Deep inverse reinforcement learning(DIRL) Traffic simulation UNITY3D
在线阅读 下载PDF
Fuzzy Hybrid Coyote Optimization Algorithm for Image Thresholding
5
作者 Linguo Li Xuwen Huang +3 位作者 Shunqiang Qian Zhangfei Li Shujing Li Romany F.Mansour 《Computers, Materials & Continua》 SCIE EI 2022年第8期3073-3090,共18页
In order to address the problems of Coyote Optimization Algorithm in image thresholding,such as easily falling into local optimum,and slow convergence speed,a Fuzzy Hybrid Coyote Optimization Algorithm(here-inafter re... In order to address the problems of Coyote Optimization Algorithm in image thresholding,such as easily falling into local optimum,and slow convergence speed,a Fuzzy Hybrid Coyote Optimization Algorithm(here-inafter referred to as FHCOA)based on chaotic initialization and reverse learning strategy is proposed,and its effect on image thresholding is verified.Through chaotic initialization,the random number initialization mode in the standard coyote optimization algorithm(COA)is replaced by chaotic sequence.Such sequence is nonlinear and long-term unpredictable,these characteristics can effectively improve the diversity of the population in the optimization algorithm.Therefore,in this paper we first perform chaotic initialization,using chaotic sequence to replace random number initialization in standard COA.By combining the lens imaging reverse learning strategy and the optimal worst reverse learning strategy,a hybrid reverse learning strategy is then formed.In the process of algorithm traversal,the best coyote and the worst coyote in the pack are selected for reverse learning operation respectively,which prevents the algorithm falling into local optimum to a certain extent and also solves the problem of premature convergence.Based on the above improvements,the coyote optimization algorithm has better global convergence and computational robustness.The simulation results show that the algorithmhas better thresholding effect than the five commonly used optimization algorithms in image thresholding when multiple images are selected and different threshold numbers are set. 展开更多
关键词 Coyote optimization algorithm image segmentation multilevel thresholding logistic chaotic map hybrid inverse learning strategy
在线阅读 下载PDF
A survey of inverse reinforcement learning techniques 被引量:2
6
作者 Shao Zhifei Er Meng Joo 《International Journal of Intelligent Computing and Cybernetics》 EI 2012年第3期293-311,共19页
Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provi... Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under uncertainty.RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment.However,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided beforehand.In practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert demonstrations.In this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and compared.Findings-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of IRL.Originality/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far. 展开更多
关键词 inverse reinforcement learning Reward function Reinforcement learning Artificial intelligence learning methods
在线阅读 下载PDF
Modified reward function on abstract features in inverse reinforcement learning 被引量:1
7
作者 Shen-yi CHEN Hui QIAN Jia FAN Zhuo-jun JIN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第9期718-723,共6页
We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unkno... We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unknown or numerous.The importance rating of each abstract feature is incorporated into the reward function.Simulation is performed on a task of driving in a five-lane highway,where the controlled car has the largest fixed speed among all the cars.Performance is almost 10.6% better on average with than without importance ratings. 展开更多
关键词 Importance rating Abstract feature Feature extraction inverse reinforcement learning(IRL) Markov decision process(MDP)
原文传递
Convergence analysis of an incremental approach to online inverse reinforcement learning
8
作者 Zhuo-jun JIN Hui QIAN Shen-yi CHEN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第1期17-24,共8页
Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and... Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function. 展开更多
关键词 Incremental approach Reward recovering Online learning inverse reinforcement learning Markov decision process
原文传递
Robot learning from demonstration for path planning: A review 被引量:12
9
作者 XIE ZongWu ZHANG Qi +1 位作者 JIANG ZaiNan LIU Hong 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2020年第8期1325-1334,共10页
Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to ca... Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory. 展开更多
关键词 learning from demonstration path planning imitation learning inverse reinforcement learning obstacle avoidance
原文传递
AInvR:Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories 被引量:1
10
作者 Hao Zhang Guoming Lu +1 位作者 Ke Qin Kai Du 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1101-1114,共14页
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential ... Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision problem.An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable.Current mainstream methods apply heuristic reward functions to counter this challenge.However,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities.To this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process.Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches. 展开更多
关键词 Knowledge Graph Reasoning(KGR) inverse Reinforcement learning(IRL) multi-hop reasoning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部