In this paper, we consider the following second order retarded differential equations x″(t)+cx′(t)=qx(t-σ)-lx(t-δ) (1) x″(t)+p(t)x(t-τ)=0 (2) We give some sufficient conditions for the oscillation of all solutio...In this paper, we consider the following second order retarded differential equations x″(t)+cx′(t)=qx(t-σ)-lx(t-δ) (1) x″(t)+p(t)x(t-τ)=0 (2) We give some sufficient conditions for the oscillation of all solutions of Eq. (1) in the case where q, ι, σ, δ are positive numbers and c is a real number. And also, we study the asymptotic behavior of the nonoscillatory solutions. If necessary, we give some examples to illustrate our results. At last, we study Eq. (2) with some conditions on p(t).展开更多
After reading the article "The Boundedness and Asymptotic Behavior of Solution of Differential System of Second-Order with Variable Coefficient" in "Applied Mathematics and Mechanics", Vol. 3, No. ...After reading the article "The Boundedness and Asymptotic Behavior of Solution of Differential System of Second-Order with Variable Coefficient" in "Applied Mathematics and Mechanics", Vol. 3, No. 4, 1982, we would like to put forward a few points to discuss with the author and the readers. Our opinions are presented as follows:展开更多
The purpose of this paper is to study the oscillation of second-order half-linear neutral differential equations with advanced argument of the form(r(t)((y(t)+p(t)y(τ(t)))')^(α))'+q(t)yα(σ(t))=0,t≥t_(0),w...The purpose of this paper is to study the oscillation of second-order half-linear neutral differential equations with advanced argument of the form(r(t)((y(t)+p(t)y(τ(t)))')^(α))'+q(t)yα(σ(t))=0,t≥t_(0),when∫^(∞)r^(−1/α)(s)ds<∞.We obtain sufficient conditions for the oscillation of the studied equations by the inequality principle and the Riccati transformation.An example is provided to illustrate the results.展开更多
堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop fo...堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop for discrete SAC)。首先设计结合YOLO(you only look once)v5和对比学习网络编码的图像编码器,能够编码关键特征和全局特征,实现像素信息至向量信息的降维。其次结合图像编码器和离散软演员-评价家(soft actor-critic,SAC)算法,设计离散动作空间和密集奖励函数约束并引导策略输出的学习方向,同时使用随机图像裁剪增加强化学习的样本效率。最后,提出了一种应用于深度强化学习预训练的二次行为克隆方法,增强了强化学习网络的学习能力并提高了控制策略的成功率。仿真实验中Ec-DSAC的避障抓取成功率稳定高于80.0%,验证其具有比现有方法更好的避障抓取性能。现实实验中避障抓取成功率为73.3%,验证其在现实堆叠覆盖环境下避障抓取的有效性。展开更多
针对部分未知环境下单个自主水下航行器(autonomous underwater vehicle,AUV)的DQN动态路径规划算法存在随机性大及收敛慢的问题,提出一种融合行为克隆、A*算法与DQN的路径规划方法(behavior cloning with A*algorithm and DQN,BA_DQN)...针对部分未知环境下单个自主水下航行器(autonomous underwater vehicle,AUV)的DQN动态路径规划算法存在随机性大及收敛慢的问题,提出一种融合行为克隆、A*算法与DQN的路径规划方法(behavior cloning with A*algorithm and DQN,BA_DQN)。基于已知的环境信息,提出一种结合海洋洋流阻力的改进A*算法来引导DQN,从而减小DQN算法的随机性;考虑到海洋环境复杂,在扩张积极经验池之后再次改进采样概率来提高训练成功率;针对DQN收敛慢的问题,提出一种先强化学习后行为克隆的改进算法。使用BA_DQN算法来控制AUV寻路,并在不同任务场景下开展仿真实验。仿真结果表明:BA_DQN算法比DQN算法的训练时间更短,比A*算法的决策更快,航行用时更短。展开更多
离线到在线强化学习中,虽然智能体能够通过预先收集的离线数据进行初步策略学习,但在线微调阶段,早期过程常常表现出不稳定性,且微调结束后,性能提升幅度较小.针对这一问题,提出了两种关键设计:1)模拟退火的动态离线-在线缓冲池;2)模拟...离线到在线强化学习中,虽然智能体能够通过预先收集的离线数据进行初步策略学习,但在线微调阶段,早期过程常常表现出不稳定性,且微调结束后,性能提升幅度较小.针对这一问题,提出了两种关键设计:1)模拟退火的动态离线-在线缓冲池;2)模拟退火的行为约束衰减.第1种设计在训练过程中利用模拟退火思想动态选择离线数据或者在线交互经验,获得优化的更新策略,动态平衡在线训练的稳定性和微调性能;第2种设计通过带降温机制的行为克隆约束,改善微调早期使用在线经验更新导致的性能突降,在微调后期逐渐放松约束,促进模型性能提升.实验结果表明,所提出的结合动态缓冲池和时间递减约束的离线到在线强化学习(dynamic replay buffer and time decaying constraints,DRB-TDC)算法在Halfcheetah、Hopper、Walker2d这3个经典MuJoCo测试任务中,在线微调训练后性能分别提升45%、65%、21%,所有任务的平均归一化得分比最优基线算法提升10%.展开更多
Reinforcement learning behavioral control(RLBC)is limited to an individual agent without any swarm mission,because it models the behavior priority learning as a Markov decision process.In this paper,a novel multi-agen...Reinforcement learning behavioral control(RLBC)is limited to an individual agent without any swarm mission,because it models the behavior priority learning as a Markov decision process.In this paper,a novel multi-agent reinforcement learning behavioral control(MARLBC)method is proposed to overcome such limitations by implementing joint learning.Specifically,a multi-agent reinforcement learning mission supervisor(MARLMS)is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer.Through modeling behavior priority switching as a cooperative Markov game,the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware.At the control layer,a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously.In particular,input saturation constraints are strictly implemented via designing a group of adaptive compensators.Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.展开更多
Transient behavior of three-dimensional semiconductor device with heat conduc- tion is described by a coupled mathematical system of four quasi-linear partial differential equations with initial-boundary value conditi...Transient behavior of three-dimensional semiconductor device with heat conduc- tion is described by a coupled mathematical system of four quasi-linear partial differential equations with initial-boundary value conditions. The electric potential is defined by an ellip- tic equation and it appears in the following three equations via the electric field intensity. The electron concentration and the hole concentration are determined by convection-dominated diffusion equations and the temperature is interpreted by a heat conduction equation. A mixed finite volume element approximation, keeping physical conservation law, is used to get numerical values of the electric potential and the accuracy is improved one order. Two con- centrations and the heat conduction are computed by a fractional step method combined with second-order upwind differences. This method can overcome numerical oscillation, dispersion and decreases computational complexity. Then a three-dimensional problem is solved by computing three successive one-dimensional problems where the method of speedup is used and the computational work is greatly shortened. An optimal second-order error estimate in L2 norm is derived by using prior estimate theory and other special techniques of partial differential equations. This type of mass-conservative parallel method is important and is most valuable in numerical analysis and application of semiconductor device.展开更多
In order to reduce the maintenance cost of structured Peer-to-Peer (P2P),Clone Node Protocol (CNP) based on user behavior is proposed.CNP considers the regularity of user behavior and uses the method of clone node.A B...In order to reduce the maintenance cost of structured Peer-to-Peer (P2P),Clone Node Protocol (CNP) based on user behavior is proposed.CNP considers the regularity of user behavior and uses the method of clone node.A Bidirectional Clone Node Chord model (BCNChord) based on CNP protocol is designed and realized.In BCNChord,Anticlockwise Searching Algorithm,Difference Push Synchronize Algorithm and Optimal Maintenance Algorithm are put forward to increase the performances.In experiments,according to the frequency of nodes,the maintenance cost of BCNChord can be 3.5%~32.5% lower than that of Chord.In the network of 212 nodes,the logic path hop is steady at 6,which is much more prior to 12 of Chord and 10 of CNChord.Theoretical analysis and experimental results show that BCNChord can effectively reduce the maintenance cost of its structure and simultaneously improve the query efficiency up to (1/4)O(logN).BCNChord is more suitable for highly dynamic environment and higher real-time system.展开更多
文摘In this paper, we consider the following second order retarded differential equations x″(t)+cx′(t)=qx(t-σ)-lx(t-δ) (1) x″(t)+p(t)x(t-τ)=0 (2) We give some sufficient conditions for the oscillation of all solutions of Eq. (1) in the case where q, ι, σ, δ are positive numbers and c is a real number. And also, we study the asymptotic behavior of the nonoscillatory solutions. If necessary, we give some examples to illustrate our results. At last, we study Eq. (2) with some conditions on p(t).
文摘After reading the article "The Boundedness and Asymptotic Behavior of Solution of Differential System of Second-Order with Variable Coefficient" in "Applied Mathematics and Mechanics", Vol. 3, No. 4, 1982, we would like to put forward a few points to discuss with the author and the readers. Our opinions are presented as follows:
基金This research is supported by the Shandong Provincial Natural Science Foundation of China(ZR2017MA043).
文摘The purpose of this paper is to study the oscillation of second-order half-linear neutral differential equations with advanced argument of the form(r(t)((y(t)+p(t)y(τ(t)))')^(α))'+q(t)yα(σ(t))=0,t≥t_(0),when∫^(∞)r^(−1/α)(s)ds<∞.We obtain sufficient conditions for the oscillation of the studied equations by the inequality principle and the Riccati transformation.An example is provided to illustrate the results.
文摘堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop for discrete SAC)。首先设计结合YOLO(you only look once)v5和对比学习网络编码的图像编码器,能够编码关键特征和全局特征,实现像素信息至向量信息的降维。其次结合图像编码器和离散软演员-评价家(soft actor-critic,SAC)算法,设计离散动作空间和密集奖励函数约束并引导策略输出的学习方向,同时使用随机图像裁剪增加强化学习的样本效率。最后,提出了一种应用于深度强化学习预训练的二次行为克隆方法,增强了强化学习网络的学习能力并提高了控制策略的成功率。仿真实验中Ec-DSAC的避障抓取成功率稳定高于80.0%,验证其具有比现有方法更好的避障抓取性能。现实实验中避障抓取成功率为73.3%,验证其在现实堆叠覆盖环境下避障抓取的有效性。
文摘针对部分未知环境下单个自主水下航行器(autonomous underwater vehicle,AUV)的DQN动态路径规划算法存在随机性大及收敛慢的问题,提出一种融合行为克隆、A*算法与DQN的路径规划方法(behavior cloning with A*algorithm and DQN,BA_DQN)。基于已知的环境信息,提出一种结合海洋洋流阻力的改进A*算法来引导DQN,从而减小DQN算法的随机性;考虑到海洋环境复杂,在扩张积极经验池之后再次改进采样概率来提高训练成功率;针对DQN收敛慢的问题,提出一种先强化学习后行为克隆的改进算法。使用BA_DQN算法来控制AUV寻路,并在不同任务场景下开展仿真实验。仿真结果表明:BA_DQN算法比DQN算法的训练时间更短,比A*算法的决策更快,航行用时更短。
文摘离线到在线强化学习中,虽然智能体能够通过预先收集的离线数据进行初步策略学习,但在线微调阶段,早期过程常常表现出不稳定性,且微调结束后,性能提升幅度较小.针对这一问题,提出了两种关键设计:1)模拟退火的动态离线-在线缓冲池;2)模拟退火的行为约束衰减.第1种设计在训练过程中利用模拟退火思想动态选择离线数据或者在线交互经验,获得优化的更新策略,动态平衡在线训练的稳定性和微调性能;第2种设计通过带降温机制的行为克隆约束,改善微调早期使用在线经验更新导致的性能突降,在微调后期逐渐放松约束,促进模型性能提升.实验结果表明,所提出的结合动态缓冲池和时间递减约束的离线到在线强化学习(dynamic replay buffer and time decaying constraints,DRB-TDC)算法在Halfcheetah、Hopper、Walker2d这3个经典MuJoCo测试任务中,在线微调训练后性能分别提升45%、65%、21%,所有任务的平均归一化得分比最优基线算法提升10%.
基金Project supported by the National Natural Science Foundation of China(No.92367109)。
文摘Reinforcement learning behavioral control(RLBC)is limited to an individual agent without any swarm mission,because it models the behavior priority learning as a Markov decision process.In this paper,a novel multi-agent reinforcement learning behavioral control(MARLBC)method is proposed to overcome such limitations by implementing joint learning.Specifically,a multi-agent reinforcement learning mission supervisor(MARLMS)is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer.Through modeling behavior priority switching as a cooperative Markov game,the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware.At the control layer,a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously.In particular,input saturation constraints are strictly implemented via designing a group of adaptive compensators.Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.
基金supported by National Natural Science Foundation of China(11101244,11271231)National Tackling Key Problems Program(20050200069)Doctorate Foundation of the Ministry of Education of China(20030422047)
文摘Transient behavior of three-dimensional semiconductor device with heat conduc- tion is described by a coupled mathematical system of four quasi-linear partial differential equations with initial-boundary value conditions. The electric potential is defined by an ellip- tic equation and it appears in the following three equations via the electric field intensity. The electron concentration and the hole concentration are determined by convection-dominated diffusion equations and the temperature is interpreted by a heat conduction equation. A mixed finite volume element approximation, keeping physical conservation law, is used to get numerical values of the electric potential and the accuracy is improved one order. Two con- centrations and the heat conduction are computed by a fractional step method combined with second-order upwind differences. This method can overcome numerical oscillation, dispersion and decreases computational complexity. Then a three-dimensional problem is solved by computing three successive one-dimensional problems where the method of speedup is used and the computational work is greatly shortened. An optimal second-order error estimate in L2 norm is derived by using prior estimate theory and other special techniques of partial differential equations. This type of mass-conservative parallel method is important and is most valuable in numerical analysis and application of semiconductor device.
基金supported by the National Natural Science Foundation of China under Grant No.61100205Science and Technology Project of Beijing Municipal Education Commission under Grant No.KM201110016006Doctor Start-up Foundation of BUCEA under Grant No.101002508
文摘In order to reduce the maintenance cost of structured Peer-to-Peer (P2P),Clone Node Protocol (CNP) based on user behavior is proposed.CNP considers the regularity of user behavior and uses the method of clone node.A Bidirectional Clone Node Chord model (BCNChord) based on CNP protocol is designed and realized.In BCNChord,Anticlockwise Searching Algorithm,Difference Push Synchronize Algorithm and Optimal Maintenance Algorithm are put forward to increase the performances.In experiments,according to the frequency of nodes,the maintenance cost of BCNChord can be 3.5%~32.5% lower than that of Chord.In the network of 212 nodes,the logic path hop is steady at 6,which is much more prior to 12 of Chord and 10 of CNChord.Theoretical analysis and experimental results show that BCNChord can effectively reduce the maintenance cost of its structure and simultaneously improve the query efficiency up to (1/4)O(logN).BCNChord is more suitable for highly dynamic environment and higher real-time system.