摘要
多分类器系统作为混合智能系统的分支,集成了具有多样性的分类器集合,使整体得到更优的分类性能.结果融合是该领域中的一个重要问题,在相同分类器成员下,好的融合策略可以有效提升系统整体的分类正确率.随着模型安全性得到重视,传统融合策略可解释性差的问题凸显.本文基于心理学中的知识线记忆理论进行建模,参考人类决策过程,提出了一种拥有较好可解释性的启发式多分类器集成算法,称为知识线集成算法.该算法模拟人类学习与推断的行为,组织多分类器结果的融合.在训练中,模型收集给定分类器集合的不同子集,构建不同特征空间到解空间的映射,构成知识线.在推断时,模型启发式地激活知识线,进行选择性结果集成,得到推断结果.知识线集成使用样本驱动的模式,易于进行中间过程与最终结果的分析.以决策树作为分类器的实验表明,在相同的决策树集合下,知识线集成算法分类正确率与随机森林相仿.在此基础之上,知识线集成算法可量化问题不同粒度下的难易程度,且在推断时能提供相关训练样本作为依据.
Multi-classifier System,a branch technology of Hybrid Intelligent System,integrates many classifiers to approach higher accuracy.Because of the limitation of computing resource and the quality of classifiers,classifiers fusion is an important problem in Multi-classifier System.Better fusion strategy can reach higher performance of whole Multi-classifier System under the same well-trained classifier members.The traditional methods had tried many fusion strategies such as normal voting,weighted voting and fusion function.As the models developed,the classification accuracy went higher.But these models only paid attention to classification accuracy and paid little attention to interpretability which is an inevitable problem when safety of model was concerned.This paper takes a view of human decision making and presents a new multiclassifier ensemble algorithm named knowledge-line ensemble which based on knowledge-line memory theory describing the process of human decision making with memory.In order to get the interpretability like human decision making,knowledge-line ensemble algorithm imitates the learning and inference processes of human according to the psychological theory description.In training,the model tries to create memory called knowledge-line like human to store memory about solving different problems and forget memory like human in order to avoid sinking into special bad cases.Knowledge-line and training sample are one-to-one correspondence.Knowledge-line is a subset of given well-trained classifiers which can result in right classification on the corresponding sample.Different samples result in creating different knowledge-lines,so after training,the model stores varied knowledge-lines.These knowledge-lines create a set of mappings which are used to map feature space to answer space.In inference,the model chooses a subset of existing knowledge-lines to activate depending on heuristics rules.These active knowledge-lines will work,and vote to get a result.Knowledge-line ensemble algorithm is a kind of sample driven method,when inferring a new case,only the knowledge-lines born with familiar samples will be activated.It seems that human beings think of solution in memory when suffering from troubles.So knowledge-line ensemble algorithm is using sampled data to make decisions.Specially,because the process that the knowledge-line memory theory uses computing units to construct knowledge lines is similar to adding elements to sets,in order to describe the calculation process of the algorithm better,this paper uses matrices to model this process.The connection relationship between the knowledge-lines and the computing units can be represented by an adjacency matrix,the results of different classifiers can be stored by a classification matrix,and the activation of the knowledge-lines can be completed in the form of the inner product of the results of all knowledge-lines and the activation vectors.So the final classification result can be expressed in the form of matrix multiplication.On this basis,the goal and convergence of the algorithm are explained.In the experiments,this paper used decision trees as the given classifiers.Under the same given classifiers,experiments showed that knowledge-line ensemble algorithm had comparable accuracy with random forest which uses normal voting as its coordinating strategy.More importantly,knowledge-line ensemble algorithm can discriminate the difficulty of inference cases according to the active situation of knowledge-lines and give specific training cases to support the inference which makes its results more convinced.
作者
于思皓
郭嘉丰
范意兴
兰艳艳
程学旗
YU Si-Hao;GUO Jia-Feng;FAN Yi-Xing;LAN Yan-Yan;CHENG Xue-Qi(Key Lab of Netowrk Data Scince and Tchnology,Institute of Computing Techuology,Chinese Academy of Sciences,Bijing 100190;University of Chinese Academy of Sciences,Beijing100190;Institute of Netoork Technology ICT(YANTAI)CAS,Yantai,Shandong 264005)
出处
《计算机学报》
EI
CSCD
北大核心
2021年第3期462-475,共14页
Chinese Journal of Computers
基金
国家自然科学基金项目(61722211,61872338,61902381)
北京智源人工智能研究院(BAAI2019ZD0306)
中国科学院青年创新促进会(20144310)
国家重点研发计划(2016QY02D0405)
联想-中科院联合实验室青年科学家项目
王宽诚教育基金会
重庆市基础科学与前沿技术研究专项项目(重点)(cstc2017jcjyBX0059)
泰山学者工程专项经费(ts201511082)资助。
关键词
多分类器
知识线记忆理论
启发式
样本驱动
可解释性
multi-classifier
knowledge-line memory theory
heuristics
sample driven
interpretability