This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting...Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.展开更多
Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,invo...Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.展开更多
针对多智能体协同执行围捕任务面临的动态性,提出了一种基于规则与改进wall-following的深度强化学习算法(Rule-Based Deep Reinforcement Learning,RBDRL)。首先,RBDRL算法根据目标和障碍物在历史行为区间内动作选择的统计,进行连续执...针对多智能体协同执行围捕任务面临的动态性,提出了一种基于规则与改进wall-following的深度强化学习算法(Rule-Based Deep Reinforcement Learning,RBDRL)。首先,RBDRL算法根据目标和障碍物在历史行为区间内动作选择的统计,进行连续执行多步动作的状态预测,并利用基于wall-following规则设计的Upward-Downward规则在四边形网格环境中生成闭环轨迹;其次,针对闭环轨迹中的冗余路径,采用缩减规则对轨迹进行优化;再次,将这些规则集成到深度强化学习框架中,并设计了综合型奖励机制,尤其在团队奖励中,特别纳入了对时间成本的考量;最后,将RBDRL算法分别与基于计数的深度强化学习算法和无规则的深度强化学习算法在包含不同规模和数量的静态与动态障碍物场景中进行对比实验。实验结果表明,所提方法在解决多智能体在动态环境中协同执行围捕任务的问题时,具有可行性与有效性。展开更多
Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,su...Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.展开更多
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。
文摘Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.
基金supported by the National Natural Science Foundation of China(No.82260878)Guizhou Medical University Affiliated Hospital Doctoral Research Initiation Fund Project(gyfybsky-2021-44)+3 种基金Guizhou Provincial Science and Technology Plan Project(Qiankehe Achievements LC[2022]014)High-level Innovative Talents Cultivation Program of Guizhou Province(QianKeHe[2016]5679)Province Guiyang City Science and Technology Projects,Zhu Subjects Contract([2022]4-2-5)Guizhou Science and Technology Planning Project(QianKeHe[2020]4Y198).
文摘Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.
文摘针对多智能体协同执行围捕任务面临的动态性,提出了一种基于规则与改进wall-following的深度强化学习算法(Rule-Based Deep Reinforcement Learning,RBDRL)。首先,RBDRL算法根据目标和障碍物在历史行为区间内动作选择的统计,进行连续执行多步动作的状态预测,并利用基于wall-following规则设计的Upward-Downward规则在四边形网格环境中生成闭环轨迹;其次,针对闭环轨迹中的冗余路径,采用缩减规则对轨迹进行优化;再次,将这些规则集成到深度强化学习框架中,并设计了综合型奖励机制,尤其在团队奖励中,特别纳入了对时间成本的考量;最后,将RBDRL算法分别与基于计数的深度强化学习算法和无规则的深度强化学习算法在包含不同规模和数量的静态与动态障碍物场景中进行对比实验。实验结果表明,所提方法在解决多智能体在动态环境中协同执行围捕任务的问题时,具有可行性与有效性。
文摘Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated.