Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint s...Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint space,and can’t properly represent end-efector orientation.In this paper,we present an extended DMPs framework(EDMPs)both in Cartesian space and 2-Dimensional(2D)sphere manifold for Quaternion-based orientation learning and generalization.Gaussian mixture model and Gaussian mixture regression(GMM-GMR)are adopted as the initialization phase of EDMPs to handle multi-demonstrations and obtain their mean and covariance.Additionally,some evaluation indicators including reachability and similarity are defned to characterize the learning and generalization abilities of EDMPs.Finally,a real-world experiment was conducted with human demonstrations,the endpoint poses of human arm were recorded and successfully transferred from human to the robot.The experimental results show that the absolute errors of the Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°,respectively.The Pearson’s correlation coefcients of the Cartesian and Riemannian space skills are mostly greater than 0.9.The developed EDMPs exhibits superior reachability and similarity for the multi-space skills’learning and generalization.This research proposes a fused framework with EDMPs and GMM-GMR which has sufcient capability to handle the multi-space skills in multi-demonstrations.展开更多
This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior ...This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.展开更多
Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to ca...Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.展开更多
Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning ...Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning may cause large torque fluctuations and energy consumptions,even instability or danger in control of space manipulators,especially for the planning based on the human demonstrations.Therefore,we present an autonomous planning and control strategy for space manipulators based on LfD and focus on the dynamics uncertainty problem,a common problem of actual manipulators.The process can be divided into three stages:firstly,we reproduced the stochastic directed trajectory based on the Gaussian process-based LfD;secondly,we built the model of the stochastic dynamics of the actual manipulator with Gaussian process;thirdly,we designed an optimal controller based on the dynamics model to obtain the improved commanded torques and trajectory,and used the separation theorem to deal with stochastic characteristics during control.We evaluated the strategy with locating pre-screwed bolts experiment by Tiangong-2 manipulator system on the ground.The result showed that,compared with other strategies,the strategy proposed in this paper could significantly reduce torque fluctuations and energy consumptions,and its precision can meet the task requirements.展开更多
In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal p...In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectification with imperfect demonstrations,thus reducing the function estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage rectification(LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function estimation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.展开更多
To automate heavy-duty hydraulic manipulators in construction applications,trajectory learning from demonstration is increasingly in demand.However,it faces difficulties in motion noise owing to factors such as size s...To automate heavy-duty hydraulic manipulators in construction applications,trajectory learning from demonstration is increasingly in demand.However,it faces difficulties in motion noise owing to factors such as size scaling and oscillation tendency.A smooth trajectory learning method is established to overcome this problem by segmenting the demonstration and extracting the subgoals for motion noise cancellation.The imperfect demonstration trajectory is segmented by clustering the end-effector’s velocity in the task space with locally weighted noise cancellation to reduce the impact of velocity fluctuations.A sequentially hierarchical Dirichlet process algorithm with temporal encoding is designed to extract the intended subgoals and filter inefficient operations.Then,the learned trajectory is reconstructed,combined with dynamic motion primitives(DMP).The comparison test results indicate that the proposed method can learn a relevant trajectory that reflects the real intention of the user from an imperfect demonstration.Taking DMP and Sparse Sampling as comparisons,two cases of automatic trajectory tracking tasks are performed,which shows that the average position error with respect to the reference can be reduced because inefficient operations or movements are effectively filtered.展开更多
Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high red...Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high redundancies.To this end,this paper proposed a reactive whole-body locomotion-integrated manipulation approach based on combined learning and optimization.First,human demonstrations are collected,where the wrist and pelvis movements are treated as whole-body trajectories,mapping to the end-effector(EE)and the mobile base(MB)of CMM,respectively.A time-input kernelized movement primitive(T-KMP)learns the whole-body trajectory,and a multi-dimensional kernelized movement primitive(M-KMP)learns the spatial relationship between the MB and EE pose.According to task changes,the T-KMP adapts the learned trajectories online by inserting the new desired point predicted by MKMP.Then,the updated reference trajectories are sent to a hierarchical quadratic programming(HQP)controller,where the EE and the MB trajectories tracking are set as the first and second priority tasks,generating the feasible and optimal joint level commands.An ablation simulation experiment with CMM of the HQP is conducted to show the necessity of MB trajectory tracking in mimicking human whole-body motion behavior.Finally,the tasks of the reactive pick-and-place and reactive reaching were undertaken,where the target object was randomly moved,even out of the region of demonstrations.The results showed that the proposed approach can successfully transfer and adapt the human whole-body loco-manipulation skills to CMM online with task changes.展开更多
In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movem...In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.展开更多
How to substitute the human operator with a robot in various assembly tasks has to be taken into full con-sideration in intelligent manufacturing.Autonomous robotic assembly not only brings with high working effi-cien...How to substitute the human operator with a robot in various assembly tasks has to be taken into full con-sideration in intelligent manufacturing.Autonomous robotic assembly not only brings with high working effi-ciency,better product quality and low labor cost,but also helps relieve the increasingly severe problem of population aging.However,numerous existing challenges still prevent its wide applications when a robot is assigned to finish general tasks in unstructured environment.In order to provide a fundamental understanding of the various problems involved in robotic assembly,this paper carries out a review on its recent progress and challenges with 5 key technologies focused on:perception,end-effectors,control methods,learning methods and performance evaluation.Main works in these fields are reviewed and their characteristics are analyzed while typical assembly scenarios are covered.The challenges and future directions in robotic assembly are also dis-cussed on precise perception,robotic hand,error recovery and collaborative robot.In addition to providing a systematic summarization of the required key technologies,this work is aimed at motivating more potential researches in the community of robotics,artificial intelligence,and automation engineering.展开更多
基金Supported by National Natural Science Foundation of China(Grant No.52175029)Key Industrial Chain Projects of Shaanxi Province(Grant No.2018ZDCXL-GY-06-05).
文摘Dynamic movement primitives(DMPs)as a robust and efcient framework has been studied widely for robot learning from demonstration.Classical DMPs framework mainly focuses on the movement learning in Cartesian or joint space,and can’t properly represent end-efector orientation.In this paper,we present an extended DMPs framework(EDMPs)both in Cartesian space and 2-Dimensional(2D)sphere manifold for Quaternion-based orientation learning and generalization.Gaussian mixture model and Gaussian mixture regression(GMM-GMR)are adopted as the initialization phase of EDMPs to handle multi-demonstrations and obtain their mean and covariance.Additionally,some evaluation indicators including reachability and similarity are defned to characterize the learning and generalization abilities of EDMPs.Finally,a real-world experiment was conducted with human demonstrations,the endpoint poses of human arm were recorded and successfully transferred from human to the robot.The experimental results show that the absolute errors of the Cartesian and Riemannian space skills are less than 3.5 mm and 1.0°,respectively.The Pearson’s correlation coefcients of the Cartesian and Riemannian space skills are mostly greater than 0.9.The developed EDMPs exhibits superior reachability and similarity for the multi-space skills’learning and generalization.This research proposes a fused framework with EDMPs and GMM-GMR which has sufcient capability to handle the multi-space skills in multi-demonstrations.
文摘This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.
基金supported by the National Natural Science Foundation of China(Grant No.91848202)the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.51521003)。
文摘Learning from demonstration(LfD)is an appealing method of helping robots learn new skills.Numerous papers have presented methods of LfD with good performance in robotics.However,complicated robot tasks that need to carefully regulate path planning strategies remain unanswered.Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult,as the interaction between the robot and the environment is time-varying.In this paper,we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning.This classification is based on constraints and obstacle avoidance.Finally,we summarize these methods and present promising directions for robot application and LfD theory.
基金the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(Grant No.51521003)the National Natural Science Foundation of China(Grant No.61803124)the Post-doctor Research Startup Foundation of Heilongjiang Province。
文摘Autonomous planning is a significant development direction of the space manipulator,and learning from demonstrations(LfD)is a potential strategy for complex tasks in the field.However,separating control from planning may cause large torque fluctuations and energy consumptions,even instability or danger in control of space manipulators,especially for the planning based on the human demonstrations.Therefore,we present an autonomous planning and control strategy for space manipulators based on LfD and focus on the dynamics uncertainty problem,a common problem of actual manipulators.The process can be divided into three stages:firstly,we reproduced the stochastic directed trajectory based on the Gaussian process-based LfD;secondly,we built the model of the stochastic dynamics of the actual manipulator with Gaussian process;thirdly,we designed an optimal controller based on the dynamics model to obtain the improved commanded torques and trajectory,and used the separation theorem to deal with stochastic characteristics during control.We evaluated the strategy with locating pre-screwed bolts experiment by Tiangong-2 manipulator system on the ground.The result showed that,compared with other strategies,the strategy proposed in this paper could significantly reduce torque fluctuations and energy consumptions,and its precision can meet the task requirements.
基金This work was supported by the National Key R&D Plan(2016YFB0100901)the National Natural Science Foundation of China(Grant Nos.U20B2062&61673237)the Beijing Municipal Science&Technology Project(Z191100007419001).
文摘In actor-critic reinforcement learning(RL)algorithms,function estimation errors are known to cause ineffective random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectification with imperfect demonstrations,thus reducing the function estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage rectification(LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function estimation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.52322503 and 52075055)in part by the Natural Science Foundation of Chongqing,China(Grant No.2024NSCQ-JQX0123)+1 种基金in part by the Major Scientific and Technological Research and Development Project of Jiangxi Province,China(Grant No.20233AAE02001)in part by the Independent Research Project of State Key Laboratory of Mechanical Transmission for Advanced Equipment,China(Grant No.SKLMT-ZZKT-2024R02).
文摘To automate heavy-duty hydraulic manipulators in construction applications,trajectory learning from demonstration is increasingly in demand.However,it faces difficulties in motion noise owing to factors such as size scaling and oscillation tendency.A smooth trajectory learning method is established to overcome this problem by segmenting the demonstration and extracting the subgoals for motion noise cancellation.The imperfect demonstration trajectory is segmented by clustering the end-effector’s velocity in the task space with locally weighted noise cancellation to reduce the impact of velocity fluctuations.A sequentially hierarchical Dirichlet process algorithm with temporal encoding is designed to extract the intended subgoals and filter inefficient operations.Then,the learned trajectory is reconstructed,combined with dynamic motion primitives(DMP).The comparison test results indicate that the proposed method can learn a relevant trajectory that reflects the real intention of the user from an imperfect demonstration.Taking DMP and Sparse Sampling as comparisons,two cases of automatic trajectory tracking tasks are performed,which shows that the average position error with respect to the reference can be reduced because inefficient operations or movements are effectively filtered.
基金supported by the European Research Council′s(ERC)starting grant Ergo-Lean(No.GA 850932)funding provided by The Chinese University of Hong Kong,China.
文摘Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high redundancies.To this end,this paper proposed a reactive whole-body locomotion-integrated manipulation approach based on combined learning and optimization.First,human demonstrations are collected,where the wrist and pelvis movements are treated as whole-body trajectories,mapping to the end-effector(EE)and the mobile base(MB)of CMM,respectively.A time-input kernelized movement primitive(T-KMP)learns the whole-body trajectory,and a multi-dimensional kernelized movement primitive(M-KMP)learns the spatial relationship between the MB and EE pose.According to task changes,the T-KMP adapts the learned trajectories online by inserting the new desired point predicted by MKMP.Then,the updated reference trajectories are sent to a hierarchical quadratic programming(HQP)controller,where the EE and the MB trajectories tracking are set as the first and second priority tasks,generating the feasible and optimal joint level commands.An ablation simulation experiment with CMM of the HQP is conducted to show the necessity of MB trajectory tracking in mimicking human whole-body motion behavior.Finally,the tasks of the reactive pick-and-place and reactive reaching were undertaken,where the target object was randomly moved,even out of the region of demonstrations.The results showed that the proposed approach can successfully transfer and adapt the human whole-body loco-manipulation skills to CMM online with task changes.
基金National Natural Science Foundation of China(Nos.62225304,92148204 and 62061160371)National Key Research and Development Program of China(Nos.2021ZD0114503 and 2019YFB1703600)Beijing Top Discipline for Artificial Intelligence Science and Engineering,University of Science and Technology Beijing,and the Beijing Natural Science Foundation(No.JQ20026).
文摘In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.
基金Supported by National Natural Science Foundation of China(Grant No.52205009)Open Foundation of State Key Laboratory of Mechanical Transmission for Advanced Equipment of Chongqing University of China(Grant No.SKLMT-MSKFKT-202307)Taihu Lake Innovation Fund for the School of Future Technology of Southeast University of China.
文摘How to substitute the human operator with a robot in various assembly tasks has to be taken into full con-sideration in intelligent manufacturing.Autonomous robotic assembly not only brings with high working effi-ciency,better product quality and low labor cost,but also helps relieve the increasingly severe problem of population aging.However,numerous existing challenges still prevent its wide applications when a robot is assigned to finish general tasks in unstructured environment.In order to provide a fundamental understanding of the various problems involved in robotic assembly,this paper carries out a review on its recent progress and challenges with 5 key technologies focused on:perception,end-effectors,control methods,learning methods and performance evaluation.Main works in these fields are reviewed and their characteristics are analyzed while typical assembly scenarios are covered.The challenges and future directions in robotic assembly are also dis-cussed on precise perception,robotic hand,error recovery and collaborative robot.In addition to providing a systematic summarization of the required key technologies,this work is aimed at motivating more potential researches in the community of robotics,artificial intelligence,and automation engineering.