Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumu...Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumulative reward in executing tasks,and the potential safety risks are often ignored.In this paper,an optimization method based on safe reinforcement learning(Safe RL)is proposed to address the robotic grasping problem under safety constraints.Specifically,considering the obstacle avoidance constraints of the system,the grasping problem of the manipulator is modeled as a Constrained Markov Decision Process(CMDP).The Lagrange multiplier and a dynamic weighted mechanism are introduced into the Proximal Policy Optimization(PPO)framework,leading to the development of the dynamic weighted Lagrange PPO(DWL-PPO)algorithm.The behavior of violating safety constraints is punished while the policy is optimized in this proposed method.In addition,the orientation control of the end-effector is included in the reward function,and a compound reward function adapted to changes in pose is designed.Ultimately,the efficacy and advantages of the suggested method are proved by extensive training and testing in the Pybullet simulator.The results of grasping experiments reveal that the recommended approach provides superior safety and efficiency compared with other advanced RL methods and achieves a good trade-off between model learning and risk aversion.展开更多
This paper proposes a method for planning the three-dimensional path for low-flying unmanned aerial vehicle(UAV) in complex terrain based on interfered fluid dynamical system(IFDS) and the theory of obstacle avoid...This paper proposes a method for planning the three-dimensional path for low-flying unmanned aerial vehicle(UAV) in complex terrain based on interfered fluid dynamical system(IFDS) and the theory of obstacle avoidance by the flowing stream. With no requirement of solutions to fluid equations under complex boundary conditions, the proposed method is suitable for situations with complex terrain and different shapes of obstacles. Firstly, by transforming the mountains, radar and anti-aircraft fire in complex terrain into cylindrical, conical, spherical, parallelepiped obstacles and their combinations, the 3D low-flying path planning problem is turned into solving streamlines for obstacle avoidance by fluid flow. Secondly, on the basis of a unified mathematical expression of typical obstacle shapes including sphere, cylinder, cone and parallelepiped, the modulation matrix for interfered fluid dynamical system is constructed and 3D streamlines around a single obstacle are obtained. Solutions to streamlines with multiple obstacles are then derived using weighted average of the velocity field. Thirdly, extra control force method and virtual obstacle method are proposed to deal with the stagnation point and the case of obstacles' overlapping respectively. Finally, taking path length and flight height as sub-goals, genetic algorithm(GA) is used to obtain optimal 3D path under the maneuverability constraints of the UAV. Simulation results show that the environmental modeling is simple and the path is smooth and suitable for UAV. Theoretical proof is also presented to show that the proposed method has no effect on the characteristics of fluid avoiding obstacles.展开更多
文摘Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumulative reward in executing tasks,and the potential safety risks are often ignored.In this paper,an optimization method based on safe reinforcement learning(Safe RL)is proposed to address the robotic grasping problem under safety constraints.Specifically,considering the obstacle avoidance constraints of the system,the grasping problem of the manipulator is modeled as a Constrained Markov Decision Process(CMDP).The Lagrange multiplier and a dynamic weighted mechanism are introduced into the Proximal Policy Optimization(PPO)framework,leading to the development of the dynamic weighted Lagrange PPO(DWL-PPO)algorithm.The behavior of violating safety constraints is punished while the policy is optimized in this proposed method.In addition,the orientation control of the end-effector is included in the reward function,and a compound reward function adapted to changes in pose is designed.Ultimately,the efficacy and advantages of the suggested method are proved by extensive training and testing in the Pybullet simulator.The results of grasping experiments reveal that the recommended approach provides superior safety and efficiency compared with other advanced RL methods and achieves a good trade-off between model learning and risk aversion.
基金supported by the National Natural Science Foundation of China(No.61175084)
文摘This paper proposes a method for planning the three-dimensional path for low-flying unmanned aerial vehicle(UAV) in complex terrain based on interfered fluid dynamical system(IFDS) and the theory of obstacle avoidance by the flowing stream. With no requirement of solutions to fluid equations under complex boundary conditions, the proposed method is suitable for situations with complex terrain and different shapes of obstacles. Firstly, by transforming the mountains, radar and anti-aircraft fire in complex terrain into cylindrical, conical, spherical, parallelepiped obstacles and their combinations, the 3D low-flying path planning problem is turned into solving streamlines for obstacle avoidance by fluid flow. Secondly, on the basis of a unified mathematical expression of typical obstacle shapes including sphere, cylinder, cone and parallelepiped, the modulation matrix for interfered fluid dynamical system is constructed and 3D streamlines around a single obstacle are obtained. Solutions to streamlines with multiple obstacles are then derived using weighted average of the velocity field. Thirdly, extra control force method and virtual obstacle method are proposed to deal with the stagnation point and the case of obstacles' overlapping respectively. Finally, taking path length and flight height as sub-goals, genetic algorithm(GA) is used to obtain optimal 3D path under the maneuverability constraints of the UAV. Simulation results show that the environmental modeling is simple and the path is smooth and suitable for UAV. Theoretical proof is also presented to show that the proposed method has no effect on the characteristics of fluid avoiding obstacles.