Enhancing Autonomous Decision-Making (ADM) for unmanned combat aerial vehicle formations in beyond-visual-range air combat is pivotal for future battlefields, whereas the predominant reinforcement learning technique f...Enhancing Autonomous Decision-Making (ADM) for unmanned combat aerial vehicle formations in beyond-visual-range air combat is pivotal for future battlefields, whereas the predominant reinforcement learning technique for ADM has been proven to be inadequately fitting complex tactical Unit Coordination (UC), limiting the integrity of decision-making for formations. This study proposes a knowledge-enhanced ADM method, with a focus on UC, to elevate formation combat effectiveness. The main innovation is integrating data mining technique with tactical knowledge mining and integration. Foremost, based on Frequent Event Arrangement Mining (FEAM) theory, a cross-channel UC knowledge mining method is designed by introducing data flow, which is capable of capturing dynamic coordinative action sequences. Then, a dual-mode knowledge integration method is proposed by employing the Graph Attention Network (GAT) and attenuated structural similarity, bolstering the interplay between autonomous UC tactics fitting and knowledge injection. The experimental results demonstrate that the algorithm surpasses the existing methods, providing more strategic maneuver trajectories and a win rate of more than 90% in different scenarios. The method is promising to augment the autonomous operational capabilities of unmanned formations and drive the evolution of combat effectiveness.展开更多
This paper proposes an autonomous maneuver decision method using transfer learning pigeon-inspired optimization(TLPIO)for unmanned combat aerial vehicles(UCAVs)in dogfight engagements.Firstly,a nonlinear F-16 aircraft...This paper proposes an autonomous maneuver decision method using transfer learning pigeon-inspired optimization(TLPIO)for unmanned combat aerial vehicles(UCAVs)in dogfight engagements.Firstly,a nonlinear F-16 aircraft model and automatic control system are constructed by a MATLAB/Simulink platform.Secondly,a 3-degrees-of-freedom(3-DOF)aircraft model is used as a maneuvering command generator,and the expanded elemental maneuver library is designed,so that the aircraft state reachable set can be obtained.Then,the game matrix is composed with the air combat situation evaluation function calculated according to the angle and range threats.Finally,a key point is that the objective function to be optimized is designed using the game mixed strategy,and the optimal mixed strategy is obtained by TLPIO.Significantly,the proposed TLPIO does not initialize the population randomly,but adopts the transfer learning method based on Kullback-Leibler(KL)divergence to initialize the population,which improves the search accuracy of the optimization algorithm.Besides,the convergence and time complexity of TLPIO are discussed.Comparison analysis with other classical optimization algorithms highlights the advantage of TLPIO.In the simulation of air combat,three initial scenarios are set,namely,opposite,offensive and defensive conditions.The effectiveness performance of the proposed autonomous maneuver decision method is verified by simulation results.展开更多
Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the ...Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.展开更多
Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to...Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to such challenges can be obtained through Integer Programming(IP),the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios.Heuristic methods,such as Genetic Algorithms,can also be applied,but they frequently produce suboptimal solutions in extensive cases.This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process(MDP),without imposing assumptions on the type of assembly line a notable distinction from most existing models.The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning(DRL)agents to optimize task and resource scheduling.To enhance the efficiency of agent training,the paper proposes two innovative tools.The first is an action-masking technique,which ensures the agent selects only feasible actions,thereby reducing training time.The second is a multi-agent approach,where each workstation is managed by an individual agent,as a result,the state and action spaces were reduced.A centralized training framework with decentralized execution is adopted,offering a scalable learning architecture for optimizing industrial assembly lines.This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action.The effectiveness of the proposed scheme is validated through numerical simulations,demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach.展开更多
Driving space for autonomous vehicles(AVs)is a simplified representation of real driving environments that helps facilitate driving decision processes.Existing literatures present numerous methods for constructing dri...Driving space for autonomous vehicles(AVs)is a simplified representation of real driving environments that helps facilitate driving decision processes.Existing literatures present numerous methods for constructing driving spaces,which is a fundamental step in AV development.This study reviews the existing researches to gain a more systematic understanding of driving space and focuses on two questions:how to reconstruct the driving environment,and how to make driving decisions within the constructed driving space.Furthermore,the advantages and disadvantages of different types of driving space are analyzed.The study provides further understanding of the relationship between perception and decision-making and gives insight into direction of future research on driving space of AVs.展开更多
文摘Enhancing Autonomous Decision-Making (ADM) for unmanned combat aerial vehicle formations in beyond-visual-range air combat is pivotal for future battlefields, whereas the predominant reinforcement learning technique for ADM has been proven to be inadequately fitting complex tactical Unit Coordination (UC), limiting the integrity of decision-making for formations. This study proposes a knowledge-enhanced ADM method, with a focus on UC, to elevate formation combat effectiveness. The main innovation is integrating data mining technique with tactical knowledge mining and integration. Foremost, based on Frequent Event Arrangement Mining (FEAM) theory, a cross-channel UC knowledge mining method is designed by introducing data flow, which is capable of capturing dynamic coordinative action sequences. Then, a dual-mode knowledge integration method is proposed by employing the Graph Attention Network (GAT) and attenuated structural similarity, bolstering the interplay between autonomous UC tactics fitting and knowledge injection. The experimental results demonstrate that the algorithm surpasses the existing methods, providing more strategic maneuver trajectories and a win rate of more than 90% in different scenarios. The method is promising to augment the autonomous operational capabilities of unmanned formations and drive the evolution of combat effectiveness.
基金the Science and Technology Innovation 2030-Key Project of“New Generation Artificial Intelligence”(2018AAA0100803)the National Natural Science Foundation of China(U20B2071,91948204,T2121003,U1913602)。
文摘This paper proposes an autonomous maneuver decision method using transfer learning pigeon-inspired optimization(TLPIO)for unmanned combat aerial vehicles(UCAVs)in dogfight engagements.Firstly,a nonlinear F-16 aircraft model and automatic control system are constructed by a MATLAB/Simulink platform.Secondly,a 3-degrees-of-freedom(3-DOF)aircraft model is used as a maneuvering command generator,and the expanded elemental maneuver library is designed,so that the aircraft state reachable set can be obtained.Then,the game matrix is composed with the air combat situation evaluation function calculated according to the angle and range threats.Finally,a key point is that the objective function to be optimized is designed using the game mixed strategy,and the optimal mixed strategy is obtained by TLPIO.Significantly,the proposed TLPIO does not initialize the population randomly,but adopts the transfer learning method based on Kullback-Leibler(KL)divergence to initialize the population,which improves the search accuracy of the optimization algorithm.Besides,the convergence and time complexity of TLPIO are discussed.Comparison analysis with other classical optimization algorithms highlights the advantage of TLPIO.In the simulation of air combat,three initial scenarios are set,namely,opposite,offensive and defensive conditions.The effectiveness performance of the proposed autonomous maneuver decision method is verified by simulation results.
文摘Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.
基金supported in part by the National Sciences and Engineering Research Council of Canada(NSERC)under the grants RGPIN-2022-04937。
文摘Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to such challenges can be obtained through Integer Programming(IP),the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios.Heuristic methods,such as Genetic Algorithms,can also be applied,but they frequently produce suboptimal solutions in extensive cases.This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process(MDP),without imposing assumptions on the type of assembly line a notable distinction from most existing models.The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning(DRL)agents to optimize task and resource scheduling.To enhance the efficiency of agent training,the paper proposes two innovative tools.The first is an action-masking technique,which ensures the agent selects only feasible actions,thereby reducing training time.The second is a multi-agent approach,where each workstation is managed by an individual agent,as a result,the state and action spaces were reduced.A centralized training framework with decentralized execution is adopted,offering a scalable learning architecture for optimizing industrial assembly lines.This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action.The effectiveness of the proposed scheme is validated through numerical simulations,demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach.
基金This work was supported in part by the National Natural Science Foundation of China(Grant No.U1864203)in part by the International Science,and Technology Cooperation Program of China(No.2016YFE0102200).
文摘Driving space for autonomous vehicles(AVs)is a simplified representation of real driving environments that helps facilitate driving decision processes.Existing literatures present numerous methods for constructing driving spaces,which is a fundamental step in AV development.This study reviews the existing researches to gain a more systematic understanding of driving space and focuses on two questions:how to reconstruct the driving environment,and how to make driving decisions within the constructed driving space.Furthermore,the advantages and disadvantages of different types of driving space are analyzed.The study provides further understanding of the relationship between perception and decision-making and gives insight into direction of future research on driving space of AVs.