Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of diff...Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.展开更多
The power monitoring system is the most important production management system in the power industry. As an important part of the power monitoring system, the user station that lacks grid binding will become an import...The power monitoring system is the most important production management system in the power industry. As an important part of the power monitoring system, the user station that lacks grid binding will become an important target of network attacks. In order to perceive the network attack events on the user station side in time, a method combining real-time detection and active defense of random domain names on the user station side was proposed. Capsule network (CapsNet) combined with long short-term memory network (LSTM) was used to classify the domain names extracted from the traffic data. When a random domain name is detected, it sent instructions to routers and switched to update their security policies through the remote terminal protocol (Telnet), or shut down the service interfaces of routers and switched to block network attacks. The experimental results showed that the use of CapsNet combined with LSTM classification algorithm can achieve 99.16% accuracy and 98% recall rate in random domain name detection. Through the Telnet protocol, routers and switches can be linked to make active defense without interrupting services.展开更多
The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gai...The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.展开更多
Background Robot grasping encompasses a wide range of research areas;however, most studies have been focused on the grasping of only stationary objects in a scene;only a few studies on how to grasp objects from a user...Background Robot grasping encompasses a wide range of research areas;however, most studies have been focused on the grasping of only stationary objects in a scene;only a few studies on how to grasp objects from a user's hand have been conducted. In this paper, a robot grasping algorithm based on deep reinforcement learning (RGRL) is proposed. Methods The RGRL takes the relative positions of the robot and the object in a user's hand as input and outputs the best action of the robot in the current state. Thus, the proposed algorithm realizes the functions of autonomous path planning and grasping objects safely from the hands of users. A new method for improving the safety of human-robot cooperation is explored. To solve the problems of a low utilization rate and slow convergence of reinforcement learning algorithms, the RGRL is first trained in a simulation scene, and then, the model para-meters are applied to a real scene. To reduce the difference between the simulated and real scenes, domain randomization is applied to randomly change the positions and angles of objects in the simulated scenes at regular intervals, thereby improving the diversity of the training samples and robustness of the algorithm. Results The RGRL's effectiveness and accuracy are verified by evaluating it on both simulated and real scenes, and the results show that the RGRL can achieve an accuracy of more than 80% in both cases. Conclusions RGRL is a robot grasping algorithm that employs domain randomization and deep reinforcement learning for effective grasping in simulated and real scenes. However, it lacks flexibility in adapting to different grasping poses, prompting future research in achieving safe grasping for diverse user postures.展开更多
Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Poli...Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm.展开更多
Aiming at the dynamic response of reticulated shell structures under wind load,systematic parameter analyses on wind-induced responses of Kiewitt6-6 type single-layer spherical reticulated shell structures and three-w...Aiming at the dynamic response of reticulated shell structures under wind load,systematic parameter analyses on wind-induced responses of Kiewitt6-6 type single-layer spherical reticulated shell structures and three-way grid single-layer cylindrical reticulated shell structures were performed with the random simulation method in time domain,including geometric parameters,structural parameters and aerodynamic parameters.Moreover,a wind-induced vibration coefficient was obtained,which can be a reference to the wind-resistance design of reticulated shell structures.The results indicate that the geometric parameters are the most important factor influencing wind-induced responses of the reticulated shell structures;the wind-induced vibration coeffi-cient is 3.0-3.2 for the spherical reticulated shell structures and that is 2.8-3.0 for the cylindrical reticula-ted shell structures,which shows that the wind-induced vibration coefficients of these two kinds of space frames are well-proportioned.展开更多
An application of recent uncertainty quantification techniques to Wind Engineering is presented.In particular,the study of the effects of small geometric changes in the Sunshine Skyway Bridge deck on its aerodynamic b...An application of recent uncertainty quantification techniques to Wind Engineering is presented.In particular,the study of the effects of small geometric changes in the Sunshine Skyway Bridge deck on its aerodynamic behavior is addressed.This results in the numerical solution of a proper PDE posed in a domain affected by randomness,which is handled through a mapping approach.A non-intrusive Polynomial Chaos expansion allows to transform the stochastic problem into a deterministic one,in which a commercial code is used as a black-box for the solution of a number of Reynolds-Averaged Navier-Stokes simulations.The use of proper Gauss-Patterson nested quadrature formulas with respect to a Truncated Weibull probability density function permits to limit the number of these computationally expensive simulations,though maintaining a sufficient accuracy.Polynomial Chaos approximations,statistical moments and probability density functions of time-independent quantities of interest for the engineering applications are obtained.展开更多
基金supported by the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia(Project No.MoE-IF-UJ-R2-22-04220773-1).
文摘Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.
文摘The power monitoring system is the most important production management system in the power industry. As an important part of the power monitoring system, the user station that lacks grid binding will become an important target of network attacks. In order to perceive the network attack events on the user station side in time, a method combining real-time detection and active defense of random domain names on the user station side was proposed. Capsule network (CapsNet) combined with long short-term memory network (LSTM) was used to classify the domain names extracted from the traffic data. When a random domain name is detected, it sent instructions to routers and switched to update their security policies through the remote terminal protocol (Telnet), or shut down the service interfaces of routers and switched to block network attacks. The experimental results showed that the use of CapsNet combined with LSTM classification algorithm can achieve 99.16% accuracy and 98% recall rate in random domain name detection. Through the Telnet protocol, routers and switches can be linked to make active defense without interrupting services.
文摘The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.
文摘Background Robot grasping encompasses a wide range of research areas;however, most studies have been focused on the grasping of only stationary objects in a scene;only a few studies on how to grasp objects from a user's hand have been conducted. In this paper, a robot grasping algorithm based on deep reinforcement learning (RGRL) is proposed. Methods The RGRL takes the relative positions of the robot and the object in a user's hand as input and outputs the best action of the robot in the current state. Thus, the proposed algorithm realizes the functions of autonomous path planning and grasping objects safely from the hands of users. A new method for improving the safety of human-robot cooperation is explored. To solve the problems of a low utilization rate and slow convergence of reinforcement learning algorithms, the RGRL is first trained in a simulation scene, and then, the model para-meters are applied to a real scene. To reduce the difference between the simulated and real scenes, domain randomization is applied to randomly change the positions and angles of objects in the simulated scenes at regular intervals, thereby improving the diversity of the training samples and robustness of the algorithm. Results The RGRL's effectiveness and accuracy are verified by evaluating it on both simulated and real scenes, and the results show that the RGRL can achieve an accuracy of more than 80% in both cases. Conclusions RGRL is a robot grasping algorithm that employs domain randomization and deep reinforcement learning for effective grasping in simulated and real scenes. However, it lacks flexibility in adapting to different grasping poses, prompting future research in achieving safe grasping for diverse user postures.
文摘Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm.
基金the National Natural Science Foundation of China (Grant No. 50608022)the Foundation of National Science and Technology(GrantNo.2006BAJ03B04)
文摘Aiming at the dynamic response of reticulated shell structures under wind load,systematic parameter analyses on wind-induced responses of Kiewitt6-6 type single-layer spherical reticulated shell structures and three-way grid single-layer cylindrical reticulated shell structures were performed with the random simulation method in time domain,including geometric parameters,structural parameters and aerodynamic parameters.Moreover,a wind-induced vibration coefficient was obtained,which can be a reference to the wind-resistance design of reticulated shell structures.The results indicate that the geometric parameters are the most important factor influencing wind-induced responses of the reticulated shell structures;the wind-induced vibration coeffi-cient is 3.0-3.2 for the spherical reticulated shell structures and that is 2.8-3.0 for the cylindrical reticula-ted shell structures,which shows that the wind-induced vibration coefficients of these two kinds of space frames are well-proportioned.
基金The authors would like to thank Prof.L.Bruno(Politecnico di Torino)for his continuos support in understanding and simulating the physics of the aerodynamic phenomena discussed in the paperThe authors wish also to thank Prof.F.Ricciardelli(University of Reggio Calabria)and Dr.C.Mannini(University of Florence)for kindly providing the geometrical properties of the Sunshine Skyway Bridge and the wind-tunnel set-up dataFurther thanks go to Dr.S.Khris(Optiflow Company)and Prof.G.Monegato(Politecnico di Torino)for helpful discussions about the topics of the paper.
文摘An application of recent uncertainty quantification techniques to Wind Engineering is presented.In particular,the study of the effects of small geometric changes in the Sunshine Skyway Bridge deck on its aerodynamic behavior is addressed.This results in the numerical solution of a proper PDE posed in a domain affected by randomness,which is handled through a mapping approach.A non-intrusive Polynomial Chaos expansion allows to transform the stochastic problem into a deterministic one,in which a commercial code is used as a black-box for the solution of a number of Reynolds-Averaged Navier-Stokes simulations.The use of proper Gauss-Patterson nested quadrature formulas with respect to a Truncated Weibull probability density function permits to limit the number of these computationally expensive simulations,though maintaining a sufficient accuracy.Polynomial Chaos approximations,statistical moments and probability density functions of time-independent quantities of interest for the engineering applications are obtained.