期刊文献+
共找到12篇文章
< 1 >
每页显示 20 50 100
Self-play training and analysis for GEO inspection game with modular actions
1
作者 ZHOU Rui ZHONG Weichao +1 位作者 LI Wenlong ZHANG Hao 《Journal of Systems Engineering and Electronics》 2025年第5期1353-1373,共21页
This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-coo... This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested. 展开更多
关键词 impulsive orbital game inspection mission turnbased reinforcement learning modular action self-play
在线阅读 下载PDF
Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning 被引量:7
2
作者 Bo Li Jingyi Huang +4 位作者 Shuangxia Bai Zhigang Gan Shiyang Liang Neretin Evgeny Shouwen Yao 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期64-81,共18页
Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Crit... Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Critic(SAC)algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process.At the same time,the complexity of the proposed algorithm is calculated,and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function.This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm(PSP-SAC)to improve the generalisation performance of UAV control decisions.Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training. 展开更多
关键词 air combat decision deep reinforcement learning parallel self-play SAC algorithm UAV
在线阅读 下载PDF
Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning
3
作者 Marco A. Wiering 《Journal of Intelligent Learning Systems and Applications》 2010年第2期57-68,共12页
A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for genera... A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an expert program, and 3) Learning from viewing ex-perts play against each other. Although the third possibility generates high-quality games from the start compared to initial random games generated by self-play, the drawback is that the learning program is never allowed to test moves which it prefers. Since our expert program uses a similar evaluation function as the learning program, we also examine whether it is helpful to learn directly from the board evaluations given by the expert. We compared these methods using temporal difference methods with neural networks to learn the game of backgammon. 展开更多
关键词 Board GAMES Reinforcement LEARNING TD(λ) self-play LEARNING From Demonstration
暂未订购
A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games 被引量:5
4
作者 Li ZHANG Yuxuan CHEN +4 位作者 Wei WANG Ziliang HAN Shijian Li Zhijie PAN Gang PAN 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第5期137-150,共14页
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that lea... Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge.However,it needs to train a neural network in an off-policy manner to approximate the action values.For games with large search spaces,the training may suffer from unnecessary exploration and sometimes fails to converge.In this paper,we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP,called MC-NFSP,to improve the performance in real-time zero-sum imperfect-information games.With experiments and empirical analysis,we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not.Furthermore,we develop an Asynchronous Neural Fictitious Self-Play framework(ANFSP).It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality.The experiments with th e games with hidden state information(Texas Hold^m),and the FPS(firstperson shooter)games demonstrate effectiveness of our algorithms. 展开更多
关键词 approximate Nash Equilibrium imperfect-information games dynamic games Monte Carlo tree search Neural Fictitious self-play reinforcement learning
原文传递
Decision-making and confrontation in close-range air combat based on reinforcement learning
5
作者 Mengchao YANG Shengzhe SHAN Weiwei ZHANG 《Chinese Journal of Aeronautics》 2025年第9期401-420,共20页
The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in th... The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems. 展开更多
关键词 Air combat Decision making Flight simulation Reinforcement learning self-play
原文传递
课间圈养:由来、问题与道路 被引量:3
6
作者 李长伟 《教育理论与实践》 CSSCI 北大核心 2016年第10期3-6,共4页
课间圈养日渐成为一种普遍的教育现象。这一现象的产生有两个重要原因,一是自我保存的时代精神对教育的规约,二是中国实施的独生子女政策。目前,这一现象已成为需要人们严肃对待的教育问题,课间圈养限制了学生好动的天性,不利于其身心... 课间圈养日渐成为一种普遍的教育现象。这一现象的产生有两个重要原因,一是自我保存的时代精神对教育的规约,二是中国实施的独生子女政策。目前,这一现象已成为需要人们严肃对待的教育问题,课间圈养限制了学生好动的天性,不利于其身心健康以及未来的自我保护,另外,课间圈养也阻断了孩子与自然的亲密接触,使其无法习得在与自然的交往中产生的直观的安全知识以及丰富的生命体验。走出课间圈养,根本上要依靠玩的规则的确立和落实,这就需要处理好这是什么性质的规则、谁来制定规则以及如何落实规则这三个问题。 展开更多
关键词 课间圈养 自我保存 好动的天性 玩的规则
在线阅读 下载PDF
高校学生用户学术数据库使用意向影响因素研究 被引量:12
7
作者 张培 《图书情报知识》 CSSCI 北大核心 2017年第5期108-119,共12页
学术数据库的持续使用是充分发挥其价值的关键,但目前对高校学生群体学术数据库使用意向影响因素的研究相对较少。本文基于技术接受模型和信息系统成功模型,融合自我效能、感知愉悦以及习惯三大理论,构建高校学生用户学术数据库使用的... 学术数据库的持续使用是充分发挥其价值的关键,但目前对高校学生群体学术数据库使用意向影响因素的研究相对较少。本文基于技术接受模型和信息系统成功模型,融合自我效能、感知愉悦以及习惯三大理论,构建高校学生用户学术数据库使用的理论模型。采用问卷调查、结构方程模型等方法收集收据并对研究假设进行验证,揭示高校学生使用学术数据库的影响因素和作用机制。研究发现,构建的理论模型能有效揭示高校学生用户学术数据库使用意向的影响因素,但是服务质量与感知易用性和感知有用性之间的关系、感知易用性与使用意向之间的关系没有得到证实。基于以上研究发现,对数据库厂商提出相应建议。 展开更多
关键词 高校学生 学术数据库 技术接受模型 系统成功模型 感知愉悦 自我效能 习惯
在线阅读 下载PDF
Enhanced UAV Pursuit-Evasion Using Boids Modelling:A Synergistic Integration of Bird Swarm Intelligence and DRL
8
作者 Weiqiang Jin Xingwu Tian +3 位作者 Bohang Shi Biao Zhao Haibin Duan Hao Wu 《Computers, Materials & Continua》 SCIE EI 2024年第9期3523-3553,共31页
TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving i... TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving intrusion monitoring and interception.To address the challenges of data acquisition,real-world deployment,and the limited intelligence of existing algorithms in UAV pursuit-evasion tasks,we propose an innovative swarm intelligencebased UAV pursuit-evasion control framework,namely“Boids Model-based DRL Approach for Pursuit and Escape”(Boids-PE),which synergizes the strengths of swarm intelligence from bio-inspired algorithms and deep reinforcement learning(DRL).The Boids model,which simulates collective behavior through three fundamental rules,separation,alignment,and cohesion,is adopted in our work.By integrating Boids model with the Apollonian Circles algorithm,significant improvements are achieved in capturing UAVs against simple evasion strategies.To further enhance decision-making precision,we incorporate a DRL algorithm to facilitate more accurate strategic planning.We also leverage self-play training to continuously optimize the performance of pursuit UAVs.During experimental evaluation,we meticulously designed both one-on-one and multi-to-one pursuit-evasion scenarios,customizing the state space,action space,and reward function models for each scenario.Extensive simulations,supported by the PyBullet physics engine,validate the effectiveness of our proposed method.The overall results demonstrate that Boids-PE significantly enhance the efficiency and reliability of UAV pursuit-evasion tasks,providing a practical and robust solution for the real-world application of UAV pursuit-evasion missions. 展开更多
关键词 UAV pursuit-evasion swarm intelligence algorithm Boids model deep reinforcement learning self-play training
在线阅读 下载PDF
从马斯洛高峰体验理论探讨儿童游戏 被引量:8
9
作者 李东林 《四川教育学院学报》 2007年第B10期3-4,7,共3页
本文将马斯洛高峰体验理论与儿童游戏体验相联系,从体验的角度探讨儿童游戏,从自我实现获得高峰体验这个途径对指导儿童游戏提出建议。
关键词 高峰体验 游戏性体验 自我实现
在线阅读 下载PDF
Jiu fusion artificial intelligence(JFA):a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
10
作者 Xiali LI Xiaoyu FAN +3 位作者 Junzhi YU Zhicheng DONG Xianmu CAIRANG Ping LAN 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1969-1983,共15页
Tibetan Jiu chess,recognized as a national intangible cultural heritage,is a complex game comprising two distinct phases:the layout phase and the battle phase.Improving the performance of deep reinforcement learning(D... Tibetan Jiu chess,recognized as a national intangible cultural heritage,is a complex game comprising two distinct phases:the layout phase and the battle phase.Improving the performance of deep reinforcement learning(DRL)models for Tibetan Jiu chess is challenging,especially given the constraints of hardware resources.To address this,we propose a two-stage model called JFA,which incorporates hierarchical neural networks and knowledge-guided techniques.The model includes sub-models:strategic layout model(SLM)for the layout phase and hierarchical battle model(HBM)for the battle phase.Both sub-models use similar network structures and employ parallel Monte Carlo tree search(MCTS)methods for independent self-play training.HBM is structured as a hierarchical neural network,with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions.Human knowledge-based auxiliary agents are introduced to assist SLM and HBM,simulating the entire game and providing reward signals based on square capturing or victory outcomes.Additionally,within the HBM,we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network.In the experiments against a layout model using the AlphaZero method,SLM achieves a 74%win rate,with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model.SLM also won the first place at the 2024 China National Computer Game Tournament.HBM achieves a 70%win rate when playing against other Tibetan Jiu chess models.When used together,SLM and HBM in JFA achieve an 81%win rate,comparable to the level of a human amateur 4-dan player.These results demonstrate that JFA effectively enhances artificial intelligence(AI)performance in Tibetan Jiu chess. 展开更多
关键词 GAMES Reinforcement learning Tibetan Jiu chess Separate two-stage model self-play Hierarchical neural network Parallel Monte Carlo tree search
原文传递
AI in Human-computer Gaming:Techniques,Challenges and Opportunities 被引量:2
11
作者 Qi-Yue Yin Jun Yang +6 位作者 Kai-Qi Huang Mei-Jing Zhao Wan-Cheng Ni Bin Liang Yan Huang Shu Wu Liang Wang 《Machine Intelligence Research》 EI CSCD 2023年第3期299-317,共19页
With the breakthrough of AlphaGo,human-computer gaming AI has ushered in a big explosion,attracting more and more researchers all over the world.As a recognized standard for testing artificial intelligence,various hum... With the breakthrough of AlphaGo,human-computer gaming AI has ushered in a big explosion,attracting more and more researchers all over the world.As a recognized standard for testing artificial intelligence,various human-computer gaming AI systems(AIs)have been developed,such as Libratus,OpenAI Five,and AlphaStar,which beat professional human players.The rapid development of human-computer gaming AIs indicates a big step for decision-making intelligence,and it seems that current techniques can handle very complex human-computer games.So,one natural question arises:What are the possible challenges of current techniques in human-computer gaming and what are the future trends?To answer the above question,in this paper,we survey recent successful game AIs,covering board game AIs,card game AIs,first-person shooting game AIs,and real-time strategy game AIs.Through this survey,we 1)compare the main difficulties among different kinds of games and the corresponding techniques utilized for achieving professional human-level AIs;2)summarize the mainstream frameworks and techniques that can be properly relied on for developing AIs for complex human-computer games;3)raise the challenges or drawbacks of current techniques in the successful AIs;and 4)try to point out future trends in human-computer gaming AIs.Finally,we hope that this brief review can provide an introduction for beginners and inspire insight for researchers in the field of AI in human-computer gaming. 展开更多
关键词 Human-computer gaming AI intelligent decision making deep reinforcement learning self-play
原文传递
Distributed Deep Reinforcement Learning:A Survey and a Multi-player Multi-agent Learning Toolbox
12
作者 Qiyue Yin Tongtong Yu +6 位作者 Shengqi Shen Jun Yang Meijing Zhao Wancheng Ni Kaiqi Huang Bin Liang Liang Wang 《Machine Intelligence Research》 EI CSCD 2024年第3期411-430,共20页
With the breakthrough of AlphaGo,deep reinforcement learning has become a recognized technique for solving sequential decision-making problems.Despite its reputation,data inefficiency caused by its trial and error lea... With the breakthrough of AlphaGo,deep reinforcement learning has become a recognized technique for solving sequential decision-making problems.Despite its reputation,data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas.Many methods have been developed for sample efficient deep reinforcement learning,such as environment modelling,experience transfer,and distributed modifications,among which distributed deep reinforcement learning has shown its potential in various applications,such as human-computer gaming and intelligent transportation.In this paper,we conclude the state of this exciting field,by comparing the classical distributed deep reinforcement learning methods and studying important components to achieve efficient distributed learning,covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning.Furthermore,we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions.By analysing their strengths and weaknesses,a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released,which is further validated on Wargame,a complex environment,showing the usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games.Finally,we try to point out challenges and future trends,hoping that this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning. 展开更多
关键词 Deep reinforcement learning distributed machine learning self-play population-play TOOLBOX
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部