By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
文摘基于超高效液相色谱-四极杆静电场轨道阱质谱(UPLC-Q-Exactive Orbitrap-MS)技术及网络药理学探讨化浊散结除痹方治疗痛风性关节炎(gouty arthritis,GA)的药效物质及潜在机制。采用UPLC-Q-Exactive Orbitrap-MS技术鉴定化浊散结除痹方药物成分,对其有效成分进行定性分析,共鉴定出化浊散结除痹方中184个有效成分;通过PharmMapper在线数据库筛选有效成分靶点897个,在OMIM、GeneCards、CTD等数据库获取GA相关的疾病靶点491个,进行韦恩分析后获得二者的交集靶点60个,通过Cytoscape平台构建“成分靶点-GA靶点”网络图,利用STRING数据库构建蛋白-蛋白互作网络,筛选出16个核心靶点,将核心靶点进行基因本体论(Gene Ontology,GO)与京都基因和基因组百科全书(Kyoto Encyclopedia of Genes and Genomes,KEGG)信号通路的富集分析,并构建“成分-靶点-通路”网络图,发现该方治疗GA的主要有效成分为酚类、黄酮类、生物碱类、萜类化合物,关键靶点有SRC、MMP3、MMP9、REN、ALB、IGF1R、PPARG、MAPK1、HPRT1、CASP1,通过GO分析发现其治疗GA主要涉及脂质反应、细菌反应、生物刺激的反应等生物过程,通过KEGG分析发现其治疗GA相关的通路有脂质和动脉粥样硬化、中性粒细胞胞外诱捕网、IL-17等。综上,该研究揭示了酚类、黄酮类、生物碱类、萜类化合物可能是化浊散结除痹方治疗GA的核心药效物质,其药效机制可能与SRC、MMP3、MMP9等靶点及脂质和动脉粥样硬化、中性粒细胞胞外诱捕网、IL-17等通路相关。
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.