期刊文献+

采用双层强化学习的干扰决策算法 被引量:13

An Algorithm for Jamming Decision Using Dual Reinforcement Learning
在线阅读 下载PDF
导出
摘要 为解决强化学习算法在干扰决策过程中收敛速度慢的问题,提出了一种采用双层强化学习的干扰决策算法(DRLJD)。首先对等效通信参数进行建模,模型减少了待学习参数的个数,降低了搜索空间的维度;然后利用降维后的搜索空间指导干扰参数选择,避免随机选择导致干扰性能差的缺点;最后利用选择的干扰参数施加干扰,并根据环境反馈进一步降低搜索空间的维度,通过不断交互的方式加快算法的收敛速度。此外,将以往的干扰经验以先验信息的形式加入到系统的学习进程中,进一步缩短了系统的学习时间。针对构造的干扰问题实验表明,DRLJD算法经过200次交互便学习到优异的干扰策略,小于现有算法所需600次交互,且先验信息的利用进一步降低了对交互次数的要求。以提出的新的奖赏标准作为奖赏依据,算法能够在未知通信协议情况下以牺牲交互时间为代价学习到最佳干扰策略。 A novel algorithm for jamming decision using dual reinforcement learning(DRLJD)is proposed to accelerate convergence rate of reinforcement learning algorithms in jamming decision.First,a model of equivalent communication parameter is constructed to reduce both the number of unlearned parameters and the dimension of the search space.Secondly,the search space with reduced dimension is used to choose jamming parameters and to avoid worse jamming performance caused by random selection.Finally,the selected parameters are used to take jamming action,and to reduce the dimension of search space from the environment feedback information.The convergence rate of the algorithm is accelerated by constant interaction.Moreover,previous jamming experiences are used as prior information to further shorten the learning time of the system and to accelerate the convergence rate.The proposed DRLJD algorithm is validated by taking experiments on some jamming problems.Simulation results show that the algorithm obtains the optimal or suboptimal jamming strategy with 200 interaction times which is less than600 interaction times of existing algorithms,and the use of prior information further reduces the requirements for the number of interactions.When the new reward standard is used as a basis for reward the proposed algorithm could learn the optimal jamming strategy at the expense ofinteraction times in the case that communication protocols are not known.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2018年第2期63-69,共7页 Journal of Xi'an Jiaotong University
基金 安徽省自然科学基金资助项目(1308085QF99 1408085MKL46)
关键词 强化学习 双层强化学习 干扰决策 先验信息 奖赏标准 reinforcement learning dual reinforcement learning jamming decision prior information reward standard
  • 相关文献

参考文献3

二级参考文献33

  • 1王建萍,王春江.认知无线电[M].北京:国防工业出版社.2008.
  • 2张春磊.认知电子战”拉开序幕--DARPA开始开发“智能干扰机”[J].通信电子战.2011(1):16-19.
  • 3DARPA. Notice of Intent to Award Sole Source Contract: Behavioral Learning for Adaptive Electronic Warfare (BLADE) Phase 3 [ R/OL]. (2014-2-19). https:// www. fbo. gov/spg/ODA/DARPA/CMO/DARPA-SN-14- 24/listing. html.
  • 4Barry Manz. Cognition: EW Gets Brainy [ J ]. Journal of Electronic Defense ,2012,35 (10) :32.
  • 5Air Force. Cognitive Jammer [ EB/OL ]. https ://www. fbo. gov. ( 2010-1-20 ).
  • 6ONR. Broad Agency Announcement (BAA) NUMBER 13-005 Electronic Warfare Technology [ R/OL]. (2012- 11-19). https://www, fbo. gov.
  • 7DARPA. Broad Agency AnnouncementCOMMUNICATIONS UNDER EXTREME RFSPECTRUM CONDITIONS (Com- mEx )STRATEGIC TECHNOLOGY OFFICEDARPA-BAA 10-74[R/OL]. (2010-9-10). https://www, fbo. gov.
  • 8Disruptor SRxTM[EB/OL].(201-10-4).Exelis官网.
  • 9RANDALL JANKA. Applying Cognitive Radio Concepts to Next Generation Electronic Warfare[ C ]//2010年度软件无线电会议论文集.
  • 10HAYKIN S. Cognitive radio:brain-empowered wireless com- municationsEJ:. IEEE Journal on Selected Areas in Commu- nications, 2005,23(2) :201-220.

共引文献66

同被引文献63

引证文献13

二级引证文献103

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部