期刊文献+

基于知识迁移的Ant-Q算法 被引量:4

Ant-Q Algorithm Based on Knowledge Transfer
在线阅读 下载PDF
导出
摘要 常规Ant-Q算法计算复杂度随问题的规模呈现出阶乘级的增长,极大地抑制了算法的收敛速度,同时其仅关注单一任务本身,使得求出的解不具有可重用性,在处理一系列相关联任务时效率较低.为此,提出一种基于知识迁移的Ant-Q算法,通过贝叶斯理论分析源任务与目标任务的相似率,并以此为权值确定各源任务的迁移样本数,然后将各源任务样本按迁移价值降序排列,筛选出有效迁移样本,指导Agent快速做出合理决策.在att532旅行商问题上的仿真结果表明,知识迁移能够有效降低目标任务的学习难度,从而快速找到问题的最优解. The computational complexity of traditional Ant-Q algorithm shows factorial growth with the scale of the studied problem,which greatly reduces the convergence speed.Moreover,the traditional Ant-Q algorithm only focuses on a single task,therefore,the solution for the task cannot be reusable and the algorithm will handle a series of related tasks with low efficiency.In order to improve the convergence speed,a kind of Ant-Q algorithm based on knowledge transfer is proposed.At first,the similarity between each source task and a target task is computed according to the Bayesian theory.Then the obtained similarities are viewed as the weights to determine the number of samples transferred from every source task.In the third step,the samples from source tasks are listed in a descending order according to its transfer values and some valid samples are selected.In this way,the selected samples can guide an Agent to make a rational decision quickly.Simulation results involving a traveling salesman problem att532 illustrate that the knowledge transfer technology can effectively reduce the difficulty of learning a new task and quickly find an optimal solution.
出处 《电子学报》 EI CAS CSCD 北大核心 2011年第10期2359-2365,共7页 Acta Electronica Sinica
基金 国家自然科学基金(No.60804022 No.60974050 No.61072094) 教育部新世纪优秀人才支持计划(No.NCET-08-0836 No.NCET-10-0765) 霍英东教育基金会青年教师基金(No.121066) 江苏省自然科学基金(No.BK2008126)
关键词 知识迁移 Ant-Q算法 贝叶斯理论 样本筛选 旅行商问题 knowledge transfer ant-Q algorithm bayesian theory sample selection traveling salesman problem
  • 相关文献

参考文献12

  • 1L M Gambardella,M Dorigo. Ant-Q: A reinforcement learning approach to the traveling salesman problem[ A]. Proceedings of12th International Conference on Machine learning[ C ]. New York: ACM Press, 1995.252- 260.
  • 2Y H Cheng,H T Feng,X S Wang.Actor-Critic learning based on adaptive importance sampling[ J]. Chinese Jourrlal of ElecIronics,2010,19(4) :583 - 588.
  • 3H M Rais, Z A Othman, A R Hamdan. Improved dynamic ant colony system (DACS) on symmetric traveling salesman prob-lem[A]. Proceedings of Intematonal Conference on Intelligent and Advanced Systems [ C ]. Piscataway: IEEE Inc, 2008.43 - 48.
  • 4N A Vien, N H Viet, S G Lee, T. H. Chung. Obstacle avoid- ance path planning for mobile robot based on Ant-Q reinforce-ment learning algodthm [ J ]. Lecture Notes in Computer Science, 2007,4491:704 - 713.
  • 5L Machado,R Schirru. The Ant-Q algorithm applied to the nuclear reload problem[ J]. International Journal of Annals of Nuclear Energy,2002,29(12) : 1455 - 1470.
  • 6C E Mariano, E Morelos. A multiple objective Ant-Q algorithm for the design of water dislribution irrigation[ A ]. Proceedings of the Genetic and Evolutionary Computation Conference[ C ]. San Francisco:Morgan Kaufmann, 1999.894 - 901.
  • 7X J Liu,Z H Ni.Ant-Q algorithm based optimization approach for process planning[ A]. Proceedings of the 8th mEE Intema-fional Conference on Control and Automation[ C]. Piscataway: mEE Inc., 2010.620 - 623.
  • 8X R Wang, T J Wu. The Ant(λ) ant colony optimization algo- rithm based on eligibility lrace [ A ]. Proceedings of the International Conference on Systems, Man and Cybernetics [ C]. Piscataway: IEEE Inc., 2003.4065 - 4070.
  • 9S G Lee,T C Chung.A reinforcement learning algorithm using temporal di? erence error in ant model [ J ]. LecRire Notes in Computer Science, 2005,3512: 217 - 224.
  • 10S J Pan, Q Yang. A survey on Iransfer learning [J]. IEE,E Transactions on Knowledge and Data Engineering, 2010, 22 (10) : 1345 - 1359.

二级参考文献18

  • 1Anderson J R. Cognitive Psychology and Its Applications(third edition) [M]. New York: Freeman, 1990.
  • 2Sutton R S, Barto A G. Reinforcement Learning [M]. Cambridge. MIT Press, 1998.
  • 3Bowling M, Veloso M. Reusing learned policies between similar problems[A]. Proceedings of AI* IA-98 Workshop on New Trends in Robotics [C]. Berlin, Germany: Springer Verlag. 1998.
  • 4Femandez F, Veloso M. Probabilistic policy reuse in a reinforcement learning agent[A]. Proceedings of the Fifth International Conference on Autonomous Agents and Multi-Agent Systems[C]. New York: ACM, 2006.
  • 5Femandez F, Veloso M. Policy reuse for transfer learning across tasks with different state and action spaces[A]. Proceedings of The ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning[ C]. New York: ACM, 2006.
  • 6Bemstein D S. Reusing old policies to accelerate learning on new MDPs[ R]. Amherst: Amherst College, University of Massachusetts, 1999.
  • 7Pickett M, Barto A G. PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learning[ A]. Proceedings of the Nineteenth International Conference on Machine Learning [ C]. San Francisco: Morgan Kaufmann, 2002. 506 - 513.
  • 8Mcgovem A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density [ A ]. Proceedings of the Eighteenth International Conference on Machine Learning[ C]. San Francisco: Morgan Kaufmann, 2001. 361 - 368.
  • 9Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[ J]. Journal of Artificial Intelligence Research, 2000, 13 (2) : 227 - 303.
  • 10Mehta N, Natarajan S, Tadepalli P, A Fern. Transfer in vari-able-reward hierarchical reinforcement learning [ A ]. Proceedings of the NIPS-05 Workshop on Inductive Transfer [ C ]. Cambridge: MIT Press,2005.360 - 366.

共引文献26

同被引文献32

  • 1边肇祺 张学工 等.模式识别[M].北京:清华大学出版社,2001..
  • 2Jolliffe I T. Principal Component Analysis[M]. New Yt~rk : Springer-Verlag, 1986.
  • 3Fislm" R A. The use of multiple ~ts in taxonomic problems[J]. Annals of Eugenics, 1936,7(2) : 179 - 188.
  • 4He X F, Niyogi P. l_x~cality preserving projections[ C/OL]. http://peples, cs. uchic ago. edtt/xiaofei/LPP_ NIPS03. pdf, 2003.
  • 5Pan S J, Yang Q. A survey on transfer learning [ J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22 (10) : 1345 - 1359.
  • 6Borgwar& K M, Gretton A, Rasch M J, Kriegel H P, Sch6olkopf B, Smola A J. Integraling slmcttued biological da- m by kernel maximum mean discrep~cy [ J]. Bioinfcrmatics, 2006,22(14) :49 - 57.
  • 7Pan J L, Kwok .1 T, Yang Q. Transfer learning via dimensional- ity reduction [ C/OL]. http://www, aaai. oig/Pape~AAAI/ 2008/AAAI0 - 108. pdf.
  • 8Christopher G. Atkeson, Andrew W. Moore, Stefan Scb.aal. Locally weighted learning[ J]. Artifical Intelligence Review, 1997,11(1 - 5) : 11 - 73.
  • 9D L,Lin Z C,Xiao R, Tang X O.l.Ane_ar Laplacian dis- crimina~'on for feature extraction[ C/OL]. http://m eli. microsoft, com/en- us/urn/people/zhouli~publications/2007 - cvpr-lld, pdf.
  • 10[ Li J,Li X L, Tao D C.~ for semantic object exwaetion in images[ J]. Pattern Recognition, 2O08,41 (10) ~ 3244 - 325O.

引证文献4

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部