期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control 被引量:3
1
作者 DONG Xiang ZHANG Jing +3 位作者 CHENG Long XU WenJun SU Hang MEI Tao 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2022年第10期2409-2419,共11页
The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous contr... The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous control problems, which can learn an effective control policy with an unknown system model. However, it is often affected by the high sample complexity and requires huge amounts of data to train, which limits its effectiveness in soft arm control. An improved policy gradient method, policy gradient integrating long and short-term rewards denoted as PGLS, is proposed in this paper to overcome this issue. The shortterm rewards provide more dynamic-aware exploration directions for policy learning and improve the exploration efficiency of the algorithm. PGLS can be integrated into current policy gradient algorithms, such as deep deterministic policy gradient(DDPG). The overall control framework is realized and demonstrated in a dynamics simulation environment. Simulation results show that this approach can effectively control the soft arm to reach and track the targets. Compared with DDPG and other model-free reinforcement learning algorithms, the proposed PGLS algorithm has a great improvement in convergence speed and performance. In addition, a fluid-driven soft manipulator is designed and fabricated in this paper, which can verify the proposed PGLS algorithm in real experiments in the future. 展开更多
关键词 soft arm control Cosserat rod deep reinforcement learning policy gradient algorithm high sample complexity
原文传递
A UAV collaborative defense scheme driven by DDPG algorithm 被引量:3
2
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
在线阅读 下载PDF
Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships
3
作者 Qin ZHANG Jingyi ZHOU +1 位作者 Bangping GU Xiong HU 《Journal of Zhejiang University-SCIENCE A》 2026年第3期246-261,共16页
In the complex and variable deep-sea environment,the compensation control of ship motion ensures the safety and efficiency of equipment installation and transportation in offshore wind farms.However,the ship motion po... In the complex and variable deep-sea environment,the compensation control of ship motion ensures the safety and efficiency of equipment installation and transportation in offshore wind farms.However,the ship motion posture compensation control system is severely affected by uncertainties,which significantly impact the accuracy of compensation control.In this paper,we propose a ship three-degree-of-freedom(3-DoF)motion posture stabilization control method based on the DTW-LSTM-MATD3 algorithm.We use the multi-agent twin delayed deep deterministic policy gradient(MATD3)to control a platform with six electric cylinders to achieve stable control.However,owing to random noise affecting the ship’s motion posture,we use a dynamic time warping(DTW)algorithm to distinguish between high-frequency noise and low-frequency tracking signals.Further,we embed a long short-term memory(LSTM)network into the MATD3 network to better align the Critic network’s training with the true Q-value.We use a combined reward function to enhance the agent’s exploration capability in complex dynamic environments.Finally,verification was conducted under sixth-level,abrupt sea conditions with high-frequency noise,as well as under real abrupt sea conditions,and a generalization test was also carried out.Simulation results show that the proposed DTW-LSTM-MATD3 method has great compensation control ability. 展开更多
关键词 Compensation control Multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm Dynamic time warping(DTW)algorithm Long short-term memory(LSTM)network
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部