Achieving robust walking for different stairs is one of the most challenging tasks for quadruped robots in real world.Traditional model-based methods heavily rely on environmental factors,are burdened by intricate mod...Achieving robust walking for different stairs is one of the most challenging tasks for quadruped robots in real world.Traditional model-based methods heavily rely on environmental factors,are burdened by intricate modelling complexities,and lack generalizability.The potential for advancements in adaptive locomotion control,often impeded by complex modelling processes,can be substantially enhanced through the application of Reinforcement Learning(RL).In this paper,a learning-based method is proposed to directionally enhance the stair-climbing skill of quadruped robots under different stair conditions.First,the general policy model based on proprioceptive perception is trained as a pre-training model.Then,the pre-training model was initialized,and different terrain information from the stairs was introduced for customized training to enhance the stair-climbing skill without affecting the existing locomotion performance.Finally,the customized control policy is deployed to the real robot to realize motion control in real environments.The experimental results demonstrate that the customized control policy can significantly improve the motion performance of quadruped robots when facing complex stair terrains and has certain generalizability in other complex terrains.The proposed algorithm can be extended to various terrestrial environments.展开更多
In this paper,we present a new method for finding a fixed local-optimal policy for computing the customer lifetime value.The method is developed for a class of ergodic controllable finite Markov chains.We propose an a...In this paper,we present a new method for finding a fixed local-optimal policy for computing the customer lifetime value.The method is developed for a class of ergodic controllable finite Markov chains.We propose an approach based on a non-converging state-value function that fluctuates(increases and decreases) between states of the dynamic process.We prove that it is possible to represent that function in a recursive format using a one-step-ahead fixed-optimal policy.Then,we provide an analytical formula for the numerical realization of the fixed local-optimal strategy.We also present a second approach based on linear programming,to solve the same problem,that implement the c-variable method for making the problem computationally tractable.At the end,we show that these two approaches are related:after a finite number of iterations our proposed approach converges to same result as the linear programming method.We also present a non-traditional approach for ergodicity verification.The validity of the proposed methods is successfully demonstrated theoretically and,by simulated credit-card marketing experiments computing the customer lifetime value for both an optimization and a game theory approach.展开更多
文摘Achieving robust walking for different stairs is one of the most challenging tasks for quadruped robots in real world.Traditional model-based methods heavily rely on environmental factors,are burdened by intricate modelling complexities,and lack generalizability.The potential for advancements in adaptive locomotion control,often impeded by complex modelling processes,can be substantially enhanced through the application of Reinforcement Learning(RL).In this paper,a learning-based method is proposed to directionally enhance the stair-climbing skill of quadruped robots under different stair conditions.First,the general policy model based on proprioceptive perception is trained as a pre-training model.Then,the pre-training model was initialized,and different terrain information from the stairs was introduced for customized training to enhance the stair-climbing skill without affecting the existing locomotion performance.Finally,the customized control policy is deployed to the real robot to realize motion control in real environments.The experimental results demonstrate that the customized control policy can significantly improve the motion performance of quadruped robots when facing complex stair terrains and has certain generalizability in other complex terrains.The proposed algorithm can be extended to various terrestrial environments.
文摘In this paper,we present a new method for finding a fixed local-optimal policy for computing the customer lifetime value.The method is developed for a class of ergodic controllable finite Markov chains.We propose an approach based on a non-converging state-value function that fluctuates(increases and decreases) between states of the dynamic process.We prove that it is possible to represent that function in a recursive format using a one-step-ahead fixed-optimal policy.Then,we provide an analytical formula for the numerical realization of the fixed local-optimal strategy.We also present a second approach based on linear programming,to solve the same problem,that implement the c-variable method for making the problem computationally tractable.At the end,we show that these two approaches are related:after a finite number of iterations our proposed approach converges to same result as the linear programming method.We also present a non-traditional approach for ergodicity verification.The validity of the proposed methods is successfully demonstrated theoretically and,by simulated credit-card marketing experiments computing the customer lifetime value for both an optimization and a game theory approach.