The problem of maneuvering for a servicing spacecraft(inspector)to inspect a noncooperative spacecraft(evader)in cislunar space is investigated in this paper.The evader,which may be a malfunctioning or uncontrolled sa...The problem of maneuvering for a servicing spacecraft(inspector)to inspect a noncooperative spacecraft(evader)in cislunar space is investigated in this paper.The evader,which may be a malfunctioning or uncontrolled satellite,introduces uncertainties due to its potential maneuvering capabilities.To address this challenge,the scenario is modeled as a special orbital game,incorporating the unique complexities of the cislunar environment.A variable-duration,turn-based inspection and anti-inspection game model is designed.The model defines both players'rules,constraints,and victory conditions,providing a framework for non-cooperative inspection.Strategies for both players are developed and validated based on their dynamical properties.The inspector's strategy integrates two-body Lambert transfers with shooting methods,while the evader's strategy aims to maximize the inspector's fuel consumption.Simulation results show that the evader's optimal strategy involves deliberate fluctuations in its lunar periapsis altitude,with the inspector's requiredΔV up to eight times greater than the evader's.The impact of game constraints is evaluated,and the effectiveness of deploying the inspector in low lunar orbit is compared with the inspector at the Earth-Moon Lagrange point L1.The strengths and weaknesses of both are shown.These findings provide valuable insights for future orbital servicing and orbital games.展开更多
In this paper,we investigate analytical numerical iterative strategies for the pursuit-evasion game involving spacecraft with leader–follower information.In the proposed problem,the interplay between two spacecraft g...In this paper,we investigate analytical numerical iterative strategies for the pursuit-evasion game involving spacecraft with leader–follower information.In the proposed problem,the interplay between two spacecraft gives rise to a dynamic and real-time game,complicated further by the presence of perturbation.The primary challenge lies in crafting control strategies that are both efficient and applicable to real-time game problems within a nonlinear system.To overcome this challenge,we introduce the model prediction and iterative correction technique proposed in model predictive static programming,enabling the generation of strategies in analytical iterative form for nonlinear systems.Subsequently,we proceed by integrating this model predictive framework into a simplified Stackelberg equilibrium formulation,tailored to address the practical complexities of leader–follower pursuit-evasion scenarios.Simulation results validate the effectiveness and exceptional efficiency of the proposed solution within a receding horizon framework.展开更多
This paper proposes a novel impulsive thrust strategy guided by optimal continuous thrust strategy to address two-player orbital pursuit-evasion game under impulsive thrust control.The strategy seeks to enhance the in...This paper proposes a novel impulsive thrust strategy guided by optimal continuous thrust strategy to address two-player orbital pursuit-evasion game under impulsive thrust control.The strategy seeks to enhance the interpretability of impulsive thrust strategy by integrating it within the framework of differential game in traditional continuous systems.First,this paper introduces an impulse-like constraint,with periodical changes in thrust amplitude,to characterize the impulsive thrust control.Then,the game with the impulse-like constraint is converted into the two-point boundary value problem,which is solved by the combined shooting and deep learning method proposed in this paper.Deep learning and numerical optimization are employed to obtain the guesses for unknown terminal adjoint variables and the game terminal time.Subsequently,the accurate values are solved by the shooting method to yield the optimal continuous thrust strategy with the impulse-like constraint.Finally,the shooting method is iteratively employed at each impulse decision moment to derive the impulsive thrust strategy guided by the optimal continuous thrust strategy.Numerical examples demonstrate the convergence of the combined shooting and deep learning method,even if the strongly nonlinear impulse-like constraint is introduced.The effect of the impulsive thrust strategy guided by the optimal continuous thrust strategy is also discussed.展开更多
In recent years,the availability of space orbital resources has been declining,and the increasing frequency of spacecraft close approach events has heightened the urgency for enhanced space security measures.This pape...In recent years,the availability of space orbital resources has been declining,and the increasing frequency of spacecraft close approach events has heightened the urgency for enhanced space security measures.This paper establishes a comprehensive framework for intelligent orbital game technology in space,encompassing four core technologies:threat perception of noncooperative targets,intent recognition,situation assessment,and intelligent orbital game countermeasures.The concepts of multi-turn,multi-round and multi-match in space orbital games are defined,clarifying the core technological requirements for intelligent space orbital games and establishing a cohesive technological framework.Subsequently,the current status of research on these four core technologies is investigated.The challenges faced in the existing research are analyzed,and potential solutions for future studies are proposed.This paper aims to provide readers with a thorough understanding of the latest advancements in space intelligent orbital game technology.along with insights into the future directions and challenges in this field.展开更多
This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evade...This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
This study investigates the orbital Target-Attacker-Defender(TAD)game problem in the context of space missions.In this game,the Attacker and the Defender compete for a Target that is unable to maneuver due to its orig...This study investigates the orbital Target-Attacker-Defender(TAD)game problem in the context of space missions.In this game,the Attacker and the Defender compete for a Target that is unable to maneuver due to its original mission constraints.This paper establishes three TAD game models based on the thrust output capabilities:unconstrained thrust output,thrust constrained by an upper bound,and fixed thrust magnitude.These models are then solved using differential game theory to obtain Nash equilibrium solutions for the game problems,and the correctness and effectiveness of the solution methods are verified through simulations.Furthermore,an analysis of the winning mechanisms of the game is conducted,identifying key factors that influence the game’s outcomes,including weight coefficients in payoffs,the maximum thrust acceleration limit,and the initial game state.Considering the unique characteristics of space missions,a specific focus is given to the analysis of the Defender’s initial states in the hovering formation and in-plane circling formation,revealing overall success patterns for defense strategies from these two formations.In summary,this study provides valuable insights into the control strategies and winning mechanisms of orbital TAD games,deepening our understanding of these games and offering practical guidance to improve success rates in real-world scenarios.展开更多
This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-coo...This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.展开更多
A numerical method for computing Nash equilibrium strategies(NES)of the spacecraft time-optimal orbit pursuitevasion game(TOOPEG)with continuous thrust reachable domain(RD)analysis is proposed.Through theoretical deri...A numerical method for computing Nash equilibrium strategies(NES)of the spacecraft time-optimal orbit pursuitevasion game(TOOPEG)with continuous thrust reachable domain(RD)analysis is proposed.Through theoretical derivation and Monte Carlo validation,the equivalence among the minimum time of the TOOPEG problem with NES,the minimum time of a virtual single spacecraft for a time-optimal approach to the origin,and the minimum time required for the envelope of the pursuer's RD to enclose that of the evader is established.First,the necessary conditions for NES are derived using Pontryagin's maximum principle(PMP),converting the original bilateral optimal control problem into a 7-dimensional two-point boundary value problem(TPBVP).Then,the TOOPEG is transformed into a virtual single-spacecraft time-optimal approach problem,with the above necessary conditions.By exploiting the evolutionary characteristics of the continuous-thrust RD,the problem is further reduced to a 3-dimensional nonlinear differential equation.An improved Broyden quasi-Newton iterative(IBQNI)algorithm is employed to obtain high-precision numerical solutions,and an iterative initial value construction method based on a linearized orbit dynamic model is proposed.Furthermore,a set of criteria is developed to assess the relative spatial configuration between the RD of different spacecraft.Numerical simulations demonstrate that the proposed method achieves excellent convergence and remarkable computational efficiency.展开更多
This paper presents a strategy prediction frame for multi-player orbital pursuit–evasion game that is based on discount receding horizon coevolution(DRH-CE).The proposed frame aims to enable spacecraft to indirectly ...This paper presents a strategy prediction frame for multi-player orbital pursuit–evasion game that is based on discount receding horizon coevolution(DRH-CE).The proposed frame aims to enable spacecraft to indirectly characterize the target’s possible future states by predicting strategy parameters.The authors establish a game strategy model and a strategy solution model based on DRH-CE.The payoff function parameters of the DRH-CE are utilized as strategy parameters to construct the dataset by combining the strategy solutions and parameters.Furthermore,the authors establish a strategy parameter prediction model based on long short-term memory and multi-head self-attention,and combining this model with the strategy solution model allows for the prediction of the future states of targets.The numerical examples illustrate the efficacy of the proposed frame in predicting strategy parameters and the effectiveness of the future state prediction against targets.展开更多
This paper conducts a comprehensive study on the multi-constrained two-on-one impulsive orbital pursuit–evasion game(OPEG).Firstly,considering constraints such as maneuverability,fuel reserves,and mission duration,a ...This paper conducts a comprehensive study on the multi-constrained two-on-one impulsive orbital pursuit–evasion game(OPEG).Firstly,considering constraints such as maneuverability,fuel reserves,and mission duration,a mathematical game model for the two-on-one impulsive OPEG is established,which transforms the two-on-one impulsive OPEG,where cooperation and competition coexist,into a multi-constrained three-party optimization problem suitable for solving with multi-agent deep reinforcement learning.Then,an intelligent solution method for cooperative game strategies based on the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm is proposed.In the reward function design section,a reward function based on fixed-time triggering is introduced to address the information loss problem caused by long impulse intervals.To ensure good convergence of the algorithm and guide the spacecraft to learn effective cooperative strategies during training,an immediate reward function is designed,incorporating outcome rewards,guidance rewards,and cooperative rewards.Numerical simulations validate the feasibility and effectiveness of the proposed method.To further analyze the cooperative mechanisms learned by the spacecraft during algorithm training,a comparative experiment with the one-on-one impulsive OPEG is designed.The experimental results demonstrate that the two pursuers in the two-on-one impulsive OPEG not only develop various strategies such as“pre-emptive interception”,“pincer interception”,and“trailing pursuit”during training,but also improve mission success rates and reduce mission durations through coordinated efforts.Additionally,this paper reveals the impact of the relative initial state distribution between the two pursuing spacecraft and the evading spacecraft on the effectiveness of cooperation.展开更多
基金supported by the National Key R&D Pro-gram of China:Gravitational Wave Detection Project(Nos.2021YFC2026,2021YFC2202601,2021YFC2202603)the National Natural Science Foundation of China(Nos.12172288 and 12472046)。
文摘The problem of maneuvering for a servicing spacecraft(inspector)to inspect a noncooperative spacecraft(evader)in cislunar space is investigated in this paper.The evader,which may be a malfunctioning or uncontrolled satellite,introduces uncertainties due to its potential maneuvering capabilities.To address this challenge,the scenario is modeled as a special orbital game,incorporating the unique complexities of the cislunar environment.A variable-duration,turn-based inspection and anti-inspection game model is designed.The model defines both players'rules,constraints,and victory conditions,providing a framework for non-cooperative inspection.Strategies for both players are developed and validated based on their dynamical properties.The inspector's strategy integrates two-body Lambert transfers with shooting methods,while the evader's strategy aims to maximize the inspector's fuel consumption.Simulation results show that the evader's optimal strategy involves deliberate fluctuations in its lunar periapsis altitude,with the inspector's requiredΔV up to eight times greater than the evader's.The impact of game constraints is evaluated,and the effectiveness of deploying the inspector in low lunar orbit is compared with the inspector at the Earth-Moon Lagrange point L1.The strengths and weaknesses of both are shown.These findings provide valuable insights for future orbital servicing and orbital games.
基金supported,in part,by the National Natural Science Foundation of China(Nos.12372050 and 62088101)the Zhejiang Provincial Natural Science Foundation of China(No.LR20F030003).
文摘In this paper,we investigate analytical numerical iterative strategies for the pursuit-evasion game involving spacecraft with leader–follower information.In the proposed problem,the interplay between two spacecraft gives rise to a dynamic and real-time game,complicated further by the presence of perturbation.The primary challenge lies in crafting control strategies that are both efficient and applicable to real-time game problems within a nonlinear system.To overcome this challenge,we introduce the model prediction and iterative correction technique proposed in model predictive static programming,enabling the generation of strategies in analytical iterative form for nonlinear systems.Subsequently,we proceed by integrating this model predictive framework into a simplified Stackelberg equilibrium formulation,tailored to address the practical complexities of leader–follower pursuit-evasion scenarios.Simulation results validate the effectiveness and exceptional efficiency of the proposed solution within a receding horizon framework.
基金funded by the National Natural Science Foundation of China(No.U21B6001)。
文摘This paper proposes a novel impulsive thrust strategy guided by optimal continuous thrust strategy to address two-player orbital pursuit-evasion game under impulsive thrust control.The strategy seeks to enhance the interpretability of impulsive thrust strategy by integrating it within the framework of differential game in traditional continuous systems.First,this paper introduces an impulse-like constraint,with periodical changes in thrust amplitude,to characterize the impulsive thrust control.Then,the game with the impulse-like constraint is converted into the two-point boundary value problem,which is solved by the combined shooting and deep learning method proposed in this paper.Deep learning and numerical optimization are employed to obtain the guesses for unknown terminal adjoint variables and the game terminal time.Subsequently,the accurate values are solved by the shooting method to yield the optimal continuous thrust strategy with the impulse-like constraint.Finally,the shooting method is iteratively employed at each impulse decision moment to derive the impulsive thrust strategy guided by the optimal continuous thrust strategy.Numerical examples demonstrate the convergence of the combined shooting and deep learning method,even if the strongly nonlinear impulse-like constraint is introduced.The effect of the impulsive thrust strategy guided by the optimal continuous thrust strategy is also discussed.
基金co-supported by the National Natural Science Foundation of China(Nos.124B2031,12202281)the Shanghai Natural Science Foundation,China(No.23ZR1461800)the Northwestern Polytechnical University Scientific Research Initiation Foundation,China(No.G2024KY05103).
文摘In recent years,the availability of space orbital resources has been declining,and the increasing frequency of spacecraft close approach events has heightened the urgency for enhanced space security measures.This paper establishes a comprehensive framework for intelligent orbital game technology in space,encompassing four core technologies:threat perception of noncooperative targets,intent recognition,situation assessment,and intelligent orbital game countermeasures.The concepts of multi-turn,multi-round and multi-match in space orbital games are defined,clarifying the core technological requirements for intelligent space orbital games and establishing a cohesive technological framework.Subsequently,the current status of research on these four core technologies is investigated.The challenges faced in the existing research are analyzed,and potential solutions for future studies are proposed.This paper aims to provide readers with a thorough understanding of the latest advancements in space intelligent orbital game technology.along with insights into the future directions and challenges in this field.
基金the Science and Technology Department,Heilongjiang Province under Grant Agreement No JJ2022LH0315。
文摘This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金Supported by the National Key R&D Program of China:Gravitational Wave Detection Project(Nos.2021YFC22026,2021YFC2202601,and 2021YFC2202603)the National Natural Science Foundation of China(No.12172288).
文摘This study investigates the orbital Target-Attacker-Defender(TAD)game problem in the context of space missions.In this game,the Attacker and the Defender compete for a Target that is unable to maneuver due to its original mission constraints.This paper establishes three TAD game models based on the thrust output capabilities:unconstrained thrust output,thrust constrained by an upper bound,and fixed thrust magnitude.These models are then solved using differential game theory to obtain Nash equilibrium solutions for the game problems,and the correctness and effectiveness of the solution methods are verified through simulations.Furthermore,an analysis of the winning mechanisms of the game is conducted,identifying key factors that influence the game’s outcomes,including weight coefficients in payoffs,the maximum thrust acceleration limit,and the initial game state.Considering the unique characteristics of space missions,a specific focus is given to the analysis of the Defender’s initial states in the hovering formation and in-plane circling formation,revealing overall success patterns for defense strategies from these two formations.In summary,this study provides valuable insights into the control strategies and winning mechanisms of orbital TAD games,deepening our understanding of these games and offering practical guidance to improve success rates in real-world scenarios.
文摘This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.
基金supported by the National Natural Science Foundation of China(Grant No.12572054)。
文摘A numerical method for computing Nash equilibrium strategies(NES)of the spacecraft time-optimal orbit pursuitevasion game(TOOPEG)with continuous thrust reachable domain(RD)analysis is proposed.Through theoretical derivation and Monte Carlo validation,the equivalence among the minimum time of the TOOPEG problem with NES,the minimum time of a virtual single spacecraft for a time-optimal approach to the origin,and the minimum time required for the envelope of the pursuer's RD to enclose that of the evader is established.First,the necessary conditions for NES are derived using Pontryagin's maximum principle(PMP),converting the original bilateral optimal control problem into a 7-dimensional two-point boundary value problem(TPBVP).Then,the TOOPEG is transformed into a virtual single-spacecraft time-optimal approach problem,with the above necessary conditions.By exploiting the evolutionary characteristics of the continuous-thrust RD,the problem is further reduced to a 3-dimensional nonlinear differential equation.An improved Broyden quasi-Newton iterative(IBQNI)algorithm is employed to obtain high-precision numerical solutions,and an iterative initial value construction method based on a linearized orbit dynamic model is proposed.Furthermore,a set of criteria is developed to assess the relative spatial configuration between the RD of different spacecraft.Numerical simulations demonstrate that the proposed method achieves excellent convergence and remarkable computational efficiency.
基金funded by the National Natural Science Foundation of China(Grant No.U21B6001).
文摘This paper presents a strategy prediction frame for multi-player orbital pursuit–evasion game that is based on discount receding horizon coevolution(DRH-CE).The proposed frame aims to enable spacecraft to indirectly characterize the target’s possible future states by predicting strategy parameters.The authors establish a game strategy model and a strategy solution model based on DRH-CE.The payoff function parameters of the DRH-CE are utilized as strategy parameters to construct the dataset by combining the strategy solutions and parameters.Furthermore,the authors establish a strategy parameter prediction model based on long short-term memory and multi-head self-attention,and combining this model with the strategy solution model allows for the prediction of the future states of targets.The numerical examples illustrate the efficacy of the proposed frame in predicting strategy parameters and the effectiveness of the future state prediction against targets.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(No.2021YFC22026,No.2021YFC2202601,No.2021YFC2202603)National Natural Science Foundation of China(No.12172288 and No.12472046).
文摘This paper conducts a comprehensive study on the multi-constrained two-on-one impulsive orbital pursuit–evasion game(OPEG).Firstly,considering constraints such as maneuverability,fuel reserves,and mission duration,a mathematical game model for the two-on-one impulsive OPEG is established,which transforms the two-on-one impulsive OPEG,where cooperation and competition coexist,into a multi-constrained three-party optimization problem suitable for solving with multi-agent deep reinforcement learning.Then,an intelligent solution method for cooperative game strategies based on the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm is proposed.In the reward function design section,a reward function based on fixed-time triggering is introduced to address the information loss problem caused by long impulse intervals.To ensure good convergence of the algorithm and guide the spacecraft to learn effective cooperative strategies during training,an immediate reward function is designed,incorporating outcome rewards,guidance rewards,and cooperative rewards.Numerical simulations validate the feasibility and effectiveness of the proposed method.To further analyze the cooperative mechanisms learned by the spacecraft during algorithm training,a comparative experiment with the one-on-one impulsive OPEG is designed.The experimental results demonstrate that the two pursuers in the two-on-one impulsive OPEG not only develop various strategies such as“pre-emptive interception”,“pincer interception”,and“trailing pursuit”during training,but also improve mission success rates and reduce mission durations through coordinated efforts.Additionally,this paper reveals the impact of the relative initial state distribution between the two pursuing spacecraft and the evading spacecraft on the effectiveness of cooperation.