The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challe...The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challenge is when the reward is sparse or missing.Such environments are complex exploration environments likeMontezuma’s Revenge,Pitfall,and Private Eye games.Approaches built to deal with such challenges were very demanding.This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments.Moreover,we added some simple enhancements to several hyperparameters,such as the number of actions and the sampling ratio that helped improve performance.We include the extra reward within the human demonstrations.After that,we used Prioritized Double Deep Q-Networks(Prioritized DDQN)to learning from these demonstrations.Our approach enabled the Prioritized DDQNwith a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye.We used the same games to compare our results with several baselines,such as the Rainbow and Deep Q-learning from demonstrations(DQfD)algorithm.The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time.展开更多
Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high red...Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high redundancies.To this end,this paper proposed a reactive whole-body locomotion-integrated manipulation approach based on combined learning and optimization.First,human demonstrations are collected,where the wrist and pelvis movements are treated as whole-body trajectories,mapping to the end-effector(EE)and the mobile base(MB)of CMM,respectively.A time-input kernelized movement primitive(T-KMP)learns the whole-body trajectory,and a multi-dimensional kernelized movement primitive(M-KMP)learns the spatial relationship between the MB and EE pose.According to task changes,the T-KMP adapts the learned trajectories online by inserting the new desired point predicted by MKMP.Then,the updated reference trajectories are sent to a hierarchical quadratic programming(HQP)controller,where the EE and the MB trajectories tracking are set as the first and second priority tasks,generating the feasible and optimal joint level commands.An ablation simulation experiment with CMM of the HQP is conducted to show the necessity of MB trajectory tracking in mimicking human whole-body motion behavior.Finally,the tasks of the reactive pick-and-place and reactive reaching were undertaken,where the target object was randomly moved,even out of the region of demonstrations.The results showed that the proposed approach can successfully transfer and adapt the human whole-body loco-manipulation skills to CMM online with task changes.展开更多
文摘The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challenge is when the reward is sparse or missing.Such environments are complex exploration environments likeMontezuma’s Revenge,Pitfall,and Private Eye games.Approaches built to deal with such challenges were very demanding.This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments.Moreover,we added some simple enhancements to several hyperparameters,such as the number of actions and the sampling ratio that helped improve performance.We include the extra reward within the human demonstrations.After that,we used Prioritized Double Deep Q-Networks(Prioritized DDQN)to learning from these demonstrations.Our approach enabled the Prioritized DDQNwith a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye.We used the same games to compare our results with several baselines,such as the Rainbow and Deep Q-learning from demonstrations(DQfD)algorithm.The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time.
基金supported by the European Research Council′s(ERC)starting grant Ergo-Lean(No.GA 850932)funding provided by The Chinese University of Hong Kong,China.
文摘Reactive planning and control capacity for collaborative robots is essential when the tasks change online in an unstructured environment.This is more difficult for collaborative mobile manipulators(CMM)due to high redundancies.To this end,this paper proposed a reactive whole-body locomotion-integrated manipulation approach based on combined learning and optimization.First,human demonstrations are collected,where the wrist and pelvis movements are treated as whole-body trajectories,mapping to the end-effector(EE)and the mobile base(MB)of CMM,respectively.A time-input kernelized movement primitive(T-KMP)learns the whole-body trajectory,and a multi-dimensional kernelized movement primitive(M-KMP)learns the spatial relationship between the MB and EE pose.According to task changes,the T-KMP adapts the learned trajectories online by inserting the new desired point predicted by MKMP.Then,the updated reference trajectories are sent to a hierarchical quadratic programming(HQP)controller,where the EE and the MB trajectories tracking are set as the first and second priority tasks,generating the feasible and optimal joint level commands.An ablation simulation experiment with CMM of the HQP is conducted to show the necessity of MB trajectory tracking in mimicking human whole-body motion behavior.Finally,the tasks of the reactive pick-and-place and reactive reaching were undertaken,where the target object was randomly moved,even out of the region of demonstrations.The results showed that the proposed approach can successfully transfer and adapt the human whole-body loco-manipulation skills to CMM online with task changes.