In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is propo...In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is proposed.In this considered scheme,we take into both traditional mathematical analysis and reinforcement learning(RL)scenarios with the power splitting(PS)factor constraint.For the case of traditional mathematical analysis of a fixed-PS factor,we derive an exact closed-form expressions for the ergodic capacity and outage probability in general signal-to-noise ratio(SNR)regime.Then,we combine the optimal PS factor with performance metrics to achieve the optimal transmission performance.Subsequently,based on the optimized PS factor,a RL technique called as Q-learning(QL)algorithm is proposed to derive the optimal antenna selection strategy.To highlight the performance advantage of the proposed QL with training the received SNR at the destination,we also examine the scenario of QL scheme with training channel between the relay and the destination.The results illustrate that,the optimized scheme is always superior to the fixed-PS factor scheme.In addition,a better system parameter setting with QL significantly outperforms the traditional mathematical analysis scheme.展开更多
Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobil...Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobile robot, this paper proposed a Double BP Q-learning algorithm based on the fusion of Double Q-learning algorithm and BP neural network. In order to solve the dimensional disaster problem, two BP neural network fitting value functions with the same network structure were used to replace the two <i>Q</i> value tables in Double Q-Learning algorithm to solve the problem that the <i>Q</i> value table cannot store excessive state information. By adding the mechanism of priority experience replay and using the parameter transfer to initialize the model parameters in different environments, it could accelerate the convergence rate of the algorithm, improve the learning efficiency and the generalization ability of the model. By designing specific action selection strategy in special environment, the deadlock state could be avoided and the mobile robot could reach the target point. Finally, the designed Double BP Q-learning algorithm was simulated and verified, and the probability of mobile robot reaching the target point in the parameter update process was compared with the Double Q-learning algorithm under the same condition of the planned path length. The results showed that the model trained by the improved Double BP Q-learning algorithm had a higher success rate in finding the optimal or sub-optimal path in the dense discrete environment, besides, it had stronger model generalization ability, fewer redundant sections, and could reach the target point without entering the deadlock zone in the special obstacles environment.展开更多
Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope wit...Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope with extremely complex and dynamic environment due to the huge state space. To reduce the state space, modular neural network Q-learning algorithm is proposed, which combines Q-learning algorithm with neural network and module method. Forward feedback neural network, Elman neural network and radius-basis neural network are separately employed to construct such algorithm. It is revealed that Elman neural network Q-learning algorithm has the best performance under the condition that the same neural network training method, i.e. gradient descent error back-propagation algorithm is applied.展开更多
The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a ...The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a major issue in the city,municipalities in the aspects of the environment,and logistics.An efficient supply chain with edge computing technology is used in managing medical waste.The supply chain operations include processing of waste collec-tion,transportation,and disposal of waste.Many research works have been applied to improve the management of wastage.The main issues in the existing techniques are ineffective and expensive and centralized edge computing which leads to failure in providing security,trustworthiness,and transparency.To over-come these issues,in this paper we implement an efficient Naive Bayes classifier algorithm and Q-Learning algorithm in decentralized edge computing technology with a binary bat optimization algorithm(NBQ-BBOA).This proposed work is used to track,detect,and manage medical waste.To minimize the transferring cost of medical wastage from various nodes,the Q-Learning algorithm is used.The accuracy obtained for the Naïve Bayes algorithm is 88%,the Q-Learning algo-rithm is 82%and NBQ-BBOA is 98%.The error rate of Root Mean Square Error(RMSE)and Mean Error(MAE)for the proposed work NBQ-BBOA are 0.012 and 0.045.展开更多
With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research s...With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research studies a distributed flexible job shop scheduling problem with assembly operations.Firstly,a mixed integer programming model is formulated to minimize the maximum completion time.Secondly,a Q-learning-assisted coevolutionary algorithmis presented to solve themodel:(1)Multiple populations are developed to seek required decisions simultaneously;(2)An encoding and decoding method based on problem features is applied to represent individuals;(3)A hybrid approach of heuristic rules and random methods is employed to acquire a high-quality population;(4)Three evolutionary strategies having crossover and mutation methods are adopted to enhance exploration capabilities;(5)Three neighborhood structures based on problem features are constructed,and a Q-learning-based iterative local search method is devised to improve exploitation abilities.The Q-learning approach is applied to intelligently select better neighborhood structures.Finally,a group of instances is constructed to perform comparison experiments.The effectiveness of the Q-learning approach is verified by comparing the developed algorithm with its variant without the Q-learning method.Three renowned meta-heuristic algorithms are used in comparison with the developed algorithm.The comparison results demonstrate that the designed method exhibits better performance in coping with the formulated problem.展开更多
针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数...针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。展开更多
目的:针对生物安全实验室空间密闭、障碍物形态多(球形、立方体、圆柱体、椭球体等)及精确操作要求极高的复杂环境特性,提出一种融合改进Q-learning和粒子群优化(particle swarm optimization,PSO)算法的机械臂轨迹规划与避障算法QPSO...目的:针对生物安全实验室空间密闭、障碍物形态多(球形、立方体、圆柱体、椭球体等)及精确操作要求极高的复杂环境特性,提出一种融合改进Q-learning和粒子群优化(particle swarm optimization,PSO)算法的机械臂轨迹规划与避障算法QPSO。方法:QPSO算法采用双层优化架构,上层利用改进的Q-learning算法实现路径决策,通过非线性动态温度玻尔兹曼探索策略平衡探索与利用;下层采用含动态权重和学习因子的PSO算法优化轨迹,并结合余弦定理碰撞检测策略保障避障安全性。为验证提出算法的可行性,进行算法性能分析和避障性能测试,并与标准PSO算法、遗传算法、萤火虫算法、改进快速扩展随机树(rapidly-exploring random tree star,RRT*)算法进行对比。结果:相比标准PSO算法、遗传算法、萤火虫算法和RRT*算法,提出的QPSO算法在收敛性能、轨迹长度和避障成功率方面均有显著优势,且在确保最短路径的同时可实现最大安全距离。结论:提出的QPSO算法能有效提升复杂环境下机械臂的轨迹规划和避障效果,可为生物安全实验室等类似环境的自动化实验操作提供可靠的技术支撑。展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 61720106003,Grant 61401165,Grant 61379006,Grant 61671144,and Grant 61701538in part by the Natural Science Foundation of Fujian Province under Grants 2015J01262+3 种基金in part by Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University under Grant ZQN-PY407in part by Science and Technology Innovation Teams of Henan Province for Colleges and Universities(17IRTSTHN014)in part by the Scientific and Technological Key Project of Henan Province under Grant 172102210080 and Grant 182102210449in part by the Collaborative Innovation Center for Aviation Economy Development of Henan Province。
文摘In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is proposed.In this considered scheme,we take into both traditional mathematical analysis and reinforcement learning(RL)scenarios with the power splitting(PS)factor constraint.For the case of traditional mathematical analysis of a fixed-PS factor,we derive an exact closed-form expressions for the ergodic capacity and outage probability in general signal-to-noise ratio(SNR)regime.Then,we combine the optimal PS factor with performance metrics to achieve the optimal transmission performance.Subsequently,based on the optimized PS factor,a RL technique called as Q-learning(QL)algorithm is proposed to derive the optimal antenna selection strategy.To highlight the performance advantage of the proposed QL with training the received SNR at the destination,we also examine the scenario of QL scheme with training channel between the relay and the destination.The results illustrate that,the optimized scheme is always superior to the fixed-PS factor scheme.In addition,a better system parameter setting with QL significantly outperforms the traditional mathematical analysis scheme.
文摘Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobile robot, this paper proposed a Double BP Q-learning algorithm based on the fusion of Double Q-learning algorithm and BP neural network. In order to solve the dimensional disaster problem, two BP neural network fitting value functions with the same network structure were used to replace the two <i>Q</i> value tables in Double Q-Learning algorithm to solve the problem that the <i>Q</i> value table cannot store excessive state information. By adding the mechanism of priority experience replay and using the parameter transfer to initialize the model parameters in different environments, it could accelerate the convergence rate of the algorithm, improve the learning efficiency and the generalization ability of the model. By designing specific action selection strategy in special environment, the deadlock state could be avoided and the mobile robot could reach the target point. Finally, the designed Double BP Q-learning algorithm was simulated and verified, and the probability of mobile robot reaching the target point in the parameter update process was compared with the Double Q-learning algorithm under the same condition of the planned path length. The results showed that the model trained by the improved Double BP Q-learning algorithm had a higher success rate in finding the optimal or sub-optimal path in the dense discrete environment, besides, it had stronger model generalization ability, fewer redundant sections, and could reach the target point without entering the deadlock zone in the special obstacles environment.
文摘Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope with extremely complex and dynamic environment due to the huge state space. To reduce the state space, modular neural network Q-learning algorithm is proposed, which combines Q-learning algorithm with neural network and module method. Forward feedback neural network, Elman neural network and radius-basis neural network are separately employed to construct such algorithm. It is revealed that Elman neural network Q-learning algorithm has the best performance under the condition that the same neural network training method, i.e. gradient descent error back-propagation algorithm is applied.
文摘The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a major issue in the city,municipalities in the aspects of the environment,and logistics.An efficient supply chain with edge computing technology is used in managing medical waste.The supply chain operations include processing of waste collec-tion,transportation,and disposal of waste.Many research works have been applied to improve the management of wastage.The main issues in the existing techniques are ineffective and expensive and centralized edge computing which leads to failure in providing security,trustworthiness,and transparency.To over-come these issues,in this paper we implement an efficient Naive Bayes classifier algorithm and Q-Learning algorithm in decentralized edge computing technology with a binary bat optimization algorithm(NBQ-BBOA).This proposed work is used to track,detect,and manage medical waste.To minimize the transferring cost of medical wastage from various nodes,the Q-Learning algorithm is used.The accuracy obtained for the Naïve Bayes algorithm is 88%,the Q-Learning algo-rithm is 82%and NBQ-BBOA is 98%.The error rate of Root Mean Square Error(RMSE)and Mean Error(MAE)for the proposed work NBQ-BBOA are 0.012 and 0.045.
文摘With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research studies a distributed flexible job shop scheduling problem with assembly operations.Firstly,a mixed integer programming model is formulated to minimize the maximum completion time.Secondly,a Q-learning-assisted coevolutionary algorithmis presented to solve themodel:(1)Multiple populations are developed to seek required decisions simultaneously;(2)An encoding and decoding method based on problem features is applied to represent individuals;(3)A hybrid approach of heuristic rules and random methods is employed to acquire a high-quality population;(4)Three evolutionary strategies having crossover and mutation methods are adopted to enhance exploration capabilities;(5)Three neighborhood structures based on problem features are constructed,and a Q-learning-based iterative local search method is devised to improve exploitation abilities.The Q-learning approach is applied to intelligently select better neighborhood structures.Finally,a group of instances is constructed to perform comparison experiments.The effectiveness of the Q-learning approach is verified by comparing the developed algorithm with its variant without the Q-learning method.Three renowned meta-heuristic algorithms are used in comparison with the developed algorithm.The comparison results demonstrate that the designed method exhibits better performance in coping with the formulated problem.
文摘针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。
文摘目的:针对生物安全实验室空间密闭、障碍物形态多(球形、立方体、圆柱体、椭球体等)及精确操作要求极高的复杂环境特性,提出一种融合改进Q-learning和粒子群优化(particle swarm optimization,PSO)算法的机械臂轨迹规划与避障算法QPSO。方法:QPSO算法采用双层优化架构,上层利用改进的Q-learning算法实现路径决策,通过非线性动态温度玻尔兹曼探索策略平衡探索与利用;下层采用含动态权重和学习因子的PSO算法优化轨迹,并结合余弦定理碰撞检测策略保障避障安全性。为验证提出算法的可行性,进行算法性能分析和避障性能测试,并与标准PSO算法、遗传算法、萤火虫算法、改进快速扩展随机树(rapidly-exploring random tree star,RRT*)算法进行对比。结果:相比标准PSO算法、遗传算法、萤火虫算法和RRT*算法,提出的QPSO算法在收敛性能、轨迹长度和避障成功率方面均有显著优势,且在确保最短路径的同时可实现最大安全距离。结论:提出的QPSO算法能有效提升复杂环境下机械臂的轨迹规划和避障效果,可为生物安全实验室等类似环境的自动化实验操作提供可靠的技术支撑。