This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting...Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.展开更多
Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,invo...Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.展开更多
Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,su...Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.展开更多
为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应...为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应用去中心化分散执行的双重深度Q网络(double deep Q network,DDQN)的智能体来学习生产信息与调度目标的关系。该框架采用集中训练与分散执行的策略,并使用参数共享技术,能处理多智能体强化学习中的非稳态问题。在此基础上,采用递归神经网络来管理可变长度的状态和行动表示,使智能体具有处理任意规模问题的能力。同时引入全局/局部奖励函数,以解决训练过程中的奖励稀疏问题。通过消融实验,确定了最优的参数组合。数值实验结果表明,与标准测试方案相比,本算法在目标达成度方面,平均总延迟工件数较改善前提升了24.1%~32.3%,训练速度提高了8.3%。展开更多
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。
文摘Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.
基金supported by the National Natural Science Foundation of China(No.82260878)Guizhou Medical University Affiliated Hospital Doctoral Research Initiation Fund Project(gyfybsky-2021-44)+3 种基金Guizhou Provincial Science and Technology Plan Project(Qiankehe Achievements LC[2022]014)High-level Innovative Talents Cultivation Program of Guizhou Province(QianKeHe[2016]5679)Province Guiyang City Science and Technology Projects,Zhu Subjects Contract([2022]4-2-5)Guizhou Science and Technology Planning Project(QianKeHe[2020]4Y198).
文摘Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.
文摘Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.
文摘为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应用去中心化分散执行的双重深度Q网络(double deep Q network,DDQN)的智能体来学习生产信息与调度目标的关系。该框架采用集中训练与分散执行的策略,并使用参数共享技术,能处理多智能体强化学习中的非稳态问题。在此基础上,采用递归神经网络来管理可变长度的状态和行动表示,使智能体具有处理任意规模问题的能力。同时引入全局/局部奖励函数,以解决训练过程中的奖励稀疏问题。通过消融实验,确定了最优的参数组合。数值实验结果表明,与标准测试方案相比,本算法在目标达成度方面,平均总延迟工件数较改善前提升了24.1%~32.3%,训练速度提高了8.3%。