基于知识迁移的Ant-Q算法被引量：4

Ant-Q Algorithm Based on Knowledge Transfer

下载PDF

导出

摘要常规Ant-Q算法计算复杂度随问题的规模呈现出阶乘级的增长,极大地抑制了算法的收敛速度,同时其仅关注单一任务本身,使得求出的解不具有可重用性,在处理一系列相关联任务时效率较低.为此,提出一种基于知识迁移的Ant-Q算法,通过贝叶斯理论分析源任务与目标任务的相似率,并以此为权值确定各源任务的迁移样本数,然后将各源任务样本按迁移价值降序排列,筛选出有效迁移样本,指导Agent快速做出合理决策.在att532旅行商问题上的仿真结果表明,知识迁移能够有效降低目标任务的学习难度,从而快速找到问题的最优解. The computational complexity of traditional Ant-Q algorithm shows factorial growth with the scale of the studied problem,which greatly reduces the convergence speed.Moreover,the traditional Ant-Q algorithm only focuses on a single task,therefore,the solution for the task cannot be reusable and the algorithm will handle a series of related tasks with low efficiency.In order to improve the convergence speed,a kind of Ant-Q algorithm based on knowledge transfer is proposed.At first,the similarity between each source task and a target task is computed according to the Bayesian theory.Then the obtained similarities are viewed as the weights to determine the number of samples transferred from every source task.In the third step,the samples from source tasks are listed in a descending order according to its transfer values and some valid samples are selected.In this way,the selected samples can guide an Agent to make a rational decision quickly.Simulation results involving a traveling salesman problem att532 illustrate that the knowledge transfer technology can effectively reduce the difficulty of learning a new task and quickly find an optimal solution.

作者王雪松潘杰程玉虎

机构地区中国矿业大学信息与电气工程学院

出处《电子学报》 EI CAS CSCD 北大核心 2011年第10期2359-2365,共7页 Acta Electronica Sinica

基金国家自然科学基金(No.60804022 No.60974050 No.61072094) 教育部新世纪优秀人才支持计划(No.NCET-08-0836 No.NCET-10-0765) 霍英东教育基金会青年教师基金(No.121066) 江苏省自然科学基金(No.BK2008126)

关键词知识迁移 Ant-Q算法贝叶斯理论样本筛选旅行商问题 knowledge transfer ant-Q algorithm bayesian theory sample selection traveling salesman problem

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献12

1L M Gambardella,M Dorigo. Ant-Q: A reinforcement learning approach to the traveling salesman problem[ A]. Proceedings of12th International Conference on Machine learning[ C ]. New York: ACM Press, 1995.252- 260.
2Y H Cheng,H T Feng,X S Wang.Actor-Critic learning based on adaptive importance sampling[ J]. Chinese Jourrlal of ElecIronics,2010,19(4) :583 - 588.
3H M Rais, Z A Othman, A R Hamdan. Improved dynamic ant colony system (DACS) on symmetric traveling salesman prob-lem[A]. Proceedings of Intematonal Conference on Intelligent and Advanced Systems [ C ]. Piscataway: IEEE Inc, 2008.43 - 48.
4N A Vien, N H Viet, S G Lee, T. H. Chung. Obstacle avoid- ance path planning for mobile robot based on Ant-Q reinforce-ment learning algodthm [ J ]. Lecture Notes in Computer Science, 2007,4491:704 - 713.
5L Machado,R Schirru. The Ant-Q algorithm applied to the nuclear reload problem[ J]. International Journal of Annals of Nuclear Energy,2002,29(12) : 1455 - 1470.
6C E Mariano, E Morelos. A multiple objective Ant-Q algorithm for the design of water dislribution irrigation[ A ]. Proceedings of the Genetic and Evolutionary Computation Conference[ C ]. San Francisco:Morgan Kaufmann, 1999.894 - 901.
7X J Liu,Z H Ni.Ant-Q algorithm based optimization approach for process planning[ A]. Proceedings of the 8th mEE Intema-fional Conference on Control and Automation[ C]. Piscataway: mEE Inc., 2010.620 - 623.
8X R Wang, T J Wu. The Ant(λ) ant colony optimization algo- rithm based on eligibility lrace [ A ]. Proceedings of the International Conference on Systems, Man and Cybernetics [ C]. Piscataway: IEEE Inc., 2003.4065 - 4070.
9S G Lee,T C Chung.A reinforcement learning algorithm using temporal di? erence error in ant model [ J ]. LecRire Notes in Computer Science, 2005,3512: 217 - 224.
10S J Pan, Q Yang. A survey on Iransfer learning [J]. IEE,E Transactions on Knowledge and Data Engineering, 2010, 22 (10) : 1345 - 1359.

二级参考文献18

1Anderson J R. Cognitive Psychology and Its Applications(third edition) [M]. New York: Freeman, 1990.
2Sutton R S, Barto A G. Reinforcement Learning [M]. Cambridge. MIT Press, 1998.
3Bowling M, Veloso M. Reusing learned policies between similar problems[A]. Proceedings of AI* IA-98 Workshop on New Trends in Robotics [C]. Berlin, Germany: Springer Verlag. 1998.
4Femandez F, Veloso M. Probabilistic policy reuse in a reinforcement learning agent[A]. Proceedings of the Fifth International Conference on Autonomous Agents and Multi-Agent Systems[C]. New York: ACM, 2006.
5Femandez F, Veloso M. Policy reuse for transfer learning across tasks with different state and action spaces[A]. Proceedings of The ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning[ C]. New York: ACM, 2006.
6Bemstein D S. Reusing old policies to accelerate learning on new MDPs[ R]. Amherst: Amherst College, University of Massachusetts, 1999.
7Pickett M, Barto A G. PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learning[ A]. Proceedings of the Nineteenth International Conference on Machine Learning [ C]. San Francisco: Morgan Kaufmann, 2002. 506 - 513.
8Mcgovem A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density [ A ]. Proceedings of the Eighteenth International Conference on Machine Learning[ C]. San Francisco: Morgan Kaufmann, 2001. 361 - 368.
9Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[ J]. Journal of Artificial Intelligence Research, 2000, 13 (2) : 227 - 303.
10Mehta N, Natarajan S, Tadepalli P, A Fern. Transfer in vari-able-reward hierarchical reinforcement learning [ A ]. Proceedings of the NIPS-05 Workshop on Inductive Transfer [ C ]. Cambridge: MIT Press,2005.360 - 366.

共引文献26

1韩道军,夏兰亭,卓汉逵,李磊.基于强化学习的业务流程中的柔性约束研究[J].计算机科学,2011,38(3):166-171. 被引量：2
2吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
3朱美强,程玉虎,李明,王雪松,冯涣婷.一类基于谱方法的强化学习混合迁移算法[J].自动化学报,2012,38(11):1765-1776. 被引量：11
4李冠峰,贺学剑,韩道军.强化学习在中职招生系统中的应用[J].计算机应用与软件,2013,30(4):252-254.
5唐焕玲,于立萍,鲁明羽.融合迁移学习的TranCo-Training分类模型[J].模式识别与人工智能,2013,26(5):432-439. 被引量：1
6CHENG Yuhu,CAO Ge,WANG Xuesong,PAN Jie.Weighted Multi-source TrAdaBoost[J].Chinese Journal of Electronics,2013,22(3):505-510. 被引量：5
7陈兴国,高阳,范顺国,俞亚君.基于核方法的连续动作Actor-Critic学习[J].模式识别与人工智能,2014,27(2):103-110. 被引量：8
8张倩,李明,王雪松,程玉虎,朱美强.一种面向多源领域的实例迁移学习[J].自动化学报,2014,40(6):1176-1183. 被引量：24
9张倩,李海港,李明,程玉虎.基于多源动态TrAdaBoost的实例迁移学习方法[J].中国矿业大学学报,2014,43(4):713-720. 被引量：9
10马磊,张文旭,戴朝华.多机器人系统强化学习研究综述[J].西南交通大学学报,2014,49(6):1032-1044. 被引量：15

同被引文献32

1边肇祺张学工等.模式识别[M].北京：清华大学出版社,2001..
2Jolliffe I T. Principal Component Analysis[M]. New Yt~rk : Springer-Verlag, 1986.
3Fislm" R A. The use of multiple ~ts in taxonomic problems[J]. Annals of Eugenics, 1936,7(2) : 179 - 188.
4He X F, Niyogi P. l_x~cality preserving projections[ C/OL]. http://peples, cs. uchic ago. edtt/xiaofei/LPP_ NIPS03. pdf, 2003.
5Pan S J, Yang Q. A survey on transfer learning [ J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22 (10) : 1345 - 1359.
6Borgwar& K M, Gretton A, Rasch M J, Kriegel H P, Sch6olkopf B, Smola A J. Integraling slmcttued biological da- m by kernel maximum mean discrep~cy [ J]. Bioinfcrmatics, 2006,22(14) :49 - 57.
7Pan J L, Kwok .1 T, Yang Q. Transfer learning via dimensional- ity reduction [ C/OL]. http://www, aaai. oig/Pape~AAAI/ 2008/AAAI0 - 108. pdf.
8Christopher G. Atkeson, Andrew W. Moore, Stefan Scb.aal. Locally weighted learning[ J]. Artifical Intelligence Review, 1997,11(1 - 5) : 11 - 73.
9D L,Lin Z C,Xiao R, Tang X O.l.Ane_ar Laplacian dis- crimina~'on for feature extraction[ C/OL]. http://m eli. microsoft, com/en- us/urn/people/zhouli~publications/2007 - cvpr-lld, pdf.
10[ Li J,Li X L, Tao D C.~ for semantic object exwaetion in images[ J]. Pattern Recognition, 2O08,41 (10) ~ 3244 - 325O.

引证文献4

1CHENG Yuhu,CAO Ge,WANG Xuesong,PAN Jie.Weighted Multi-source TrAdaBoost[J].Chinese Journal of Electronics,2013,22(3):505-510. 被引量：5
2皋军,黄丽莉.最大局部加权均值差异嵌入[J].电子学报,2013,41(8):1462-1468. 被引量：4
3臧绍飞,程玉虎,王雪松.基于样本局部判别权重的加权迁移成分分析[J].中国矿业大学学报,2016,45(5):1043-1049. 被引量：1
4徐小斐,陈婧,饶运清,孟荣华,袁博,罗强.迁移蚁群强化学习算法及其在矩形排样中的应用[J].计算机集成制造系统,2020,26(12):3236-3247. 被引量：13

二级引证文献23

1杨小明,胡文军,楼俊钢,蒋云良.局部分块的一类支持向量数据描述[J].计算机应用,2015,35(4):1026-1029. 被引量：2
2臧绍飞,程玉虎,王雪松.基于最大分布加权均值嵌入的领域适应学习[J].控制与决策,2016,31(11):2083-2089. 被引量：3
3惠开发,皋军.基于多核属性学习的视频多概念检测研究[J].软件导刊,2017,16(6):1-5.
4娄丰鹏,吴迪,荆晓远,吴飞.增加度量元的迁移学习跨项目软件缺陷预测[J].计算机技术与发展,2018,28(7):103-107. 被引量：3
5胡凯,严昊,夏旻,徐同,胡伟,徐春燕.基于迁移学习的卫星云图云分类[J].大气科学学报,2017,40(6):856-863. 被引量：16
6杨济海,李号号,彭汐单,张智成,黄倩,李石君.基于迁移学习的电力通信网异常站点业务数量预测[J].数据采集与处理,2019,34(3):414-421. 被引量：14
7吴骋,秦婴逸,李冬冬,王志勇.迁移学习技术及其在医疗领域中的应用[J].中国医疗设备,2020,35(9):161-164. 被引量：4
8郑宗生,胡晨雨,姜晓轶.基于改进的最大均值差异算法的深度迁移适配网络[J].计算机应用,2020,40(11):3107-3112. 被引量：8
9王争光,刘英,丁奉龙,庄子龙.基于人工智能的木材加工研究进展[J].林业机械与木工设备,2021,49(3):13-15. 被引量：9
10王帅,洪振宇.基于强化学习的机场行李装箱优化方法[J].包装工程,2022,43(3):257-263. 被引量：4

1薛志东,王燕,邱德红.逆C均值学习样本筛选方法[J].微计算机信息,2007,23(27):209-210. 被引量：1
2须文波,刘瑞杰.Ant-Q算法在矩形件优化排料中的应用[J].江南大学学报（自然科学版）,2006,5(3):270-273. 被引量：1
3谭汉松,刘安丰.从Ipv4到Ipv6网络程序的迁移[J].中南工业大学学报,2001,32(4):422-424. 被引量：2
4刘国华,包宏,李文超.人工神经网络在材料设计中的应用及其若干共性问题的研究[J].计算机与应用化学,2001,18(4):388-392. 被引量：6
5刘彩.建设我国信息基础设施的任务与目标[J].煤炭信息工作,1995(2):14-16.
6王安春.对信息系统开发的审计[J].一重技术,2012(1):72-74. 被引量：1
7吕慧显,李京,叶蔓.一种新的基于邻域筛选的分类算法研究[J].青岛理工大学学报,2010,31(5):73-76.
8刘博,杨柳,袁方.改进的KNN方法及其在中文文本分类中的应用[J].西华大学学报（自然科学版）,2008,27(2):33-36. 被引量：5
9徐会艳,胡鹏.Matlab中实现阶乘算法的几种途径[J].电脑知识与技术,2008(6):1258-1259. 被引量：2
10陈君佐.阶乘的高精度显示[J].电脑与微电子技术,1989(1):37-37.

电子学报

2011年第10期

浏览历史

内容加载中请稍等...

基于知识迁移的Ant-Q算法被引量：4

参考文献12

二级参考文献18

共引文献26

同被引文献32

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

基于知识迁移的Ant-Q算法 被引量：4

参考文献12

二级参考文献18

共引文献26

同被引文献32

引证文献4

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

基于知识迁移的Ant-Q算法被引量：4