一种最大分类间隔SVDD的多类文本分类算法被引量：2

A Multi-class Text Categorization Algorithm Based on Maximal Classification Margin SVDD

下载PDF

导出

摘要文本分类是信息检索和文本挖掘的关键技术之一。提出了一种基于支持向量数据描述(SVDD)的多类文本分类算法,用支持向量描述训练求得包围各类样本的最小超球体,并使得分类间隔最大化,在测试阶段,引入基于核空间k-近邻平均距离的判别准则,判断样本所属类别。实验结果表明,该方法具有很好的泛化能力和很好的时间性能。 Text categorization is one of the key technology to retrieve information and mine text. This paper proposes a multi-class text categorization algorithm based on maximal classification margin SVDD（ Support Vector Data Description） . This algorithm trains multi-class samples with support vector data description, then computes a minimal super spherical structure which can surround all samples and has maximal margin between each class. In the phase of testing,this algorithm classifies samples with a criterion of average dis-tance based on KNN（ K-Nearest Neighbor） . The test result shows this algorithm has good generalization capability and good time efficiency of text categorization.

作者罗琦

机构地区中国西南电子技术研究所

出处《电讯技术》北大核心 2014年第4期496-499,共4页 Telecommunication Engineering

关键词信息检索文本挖掘文本分类支持向量数据描述多类分类器 information retrieving text mining text categorization support vector data description（SVD） multi-class classifier

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Sebastiani F. Machine learning in automated text categoriza-tion [J]. ACM Computing Surveys, 2002,34(1):1-47.
2Lewis D D, Yang Y, Rose T, et al. Rcv1: A NewBenchmark Collection for Text Categorization Research[J]. Journal of Machine Learning Research, 2004( 5):361-397.
3Joachims T. Text categorization with support vector ma-chines: Learning with many relevant features[M] / / Pro-ceedings of the 10th European Conference on MachineLearning. New York:Springer-Verlag,1998:137-142.
4邓乃扬,田英杰. 支持向量机-理论、算法与拓展[M].北京:科学出版社, 2009.
5Tax D M, Duin R P W. Support vector data description[J]. Machine Learning, 2004, 54(1): 45-66.
6Hao P Y, Chiang J H, Lin Y H. A new maximal marginspherical structured multi-class support vector machine[J]. Applied Intelligence, 2009, 30(2): 98-111.
7Salton G, Wang A, Yang C S. A vector space model forautomatic indexing [J]. Communication of the ACM,1975, 18(11):613-620.
8Manevitz L,Yousef M. One class SVMs for documentclassification [J]. Journal of Machine Learning Re-search, 2002(2):139-154.
9Bennett P N, Dumais S T, Horvitz E. The combination oftext classifiers using reliability indicators [J]. Informa-tion Retrieval, 2005,8(1): 67-100.

同被引文献18

1李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量：106
2Cui W D,Kannan J,Wang H J. Discoverer:automaticprotocol reverse engineering from network traces[C] / /Proceedings of 16th USENIX Security Symposium onUSENIX Security Symposium. Berkeley,CA,USA:IEEE,2007:1-14.
3Caballero J,Yin H,Liang Z K,et al. Polyglot:automaticextraction of protocol message format using dynamic bina-ry analysis[C] / / Proceedings of the 14th ACM conferenceon Computer and communications security. Alexandria,VA:IEEE,2007:1-5.
4Hsu Y,Shu G Q,Lee D. A model-based approach to securi-ty flaw detection of network protocol implementations[C]/ /Proceedings of the 15th IEEE International Conference onNetwork Protocols. Orlando,FL:IEEE,2008:114-123.
5Xiao M M,Yu S Z,Wang Y. Automatic network protocolautomaton extraction[C] / / Proceedings of the 3rd Inter-national Conference on Network and System Security.Gold Coast,QLD:IEEE,2009:336-343.
6Wang Y P,Zhang Z B,Yao D F,et al. Inferring ProtocolState Machine from Network Traces:A Probabilistic Ap-proach[C] / / Proceedings of the 9th Applied Cryptographyand Network Security International Conference. Nerja,Spain:IEEE,2011:1-18.
7Trifilo A,Burschka S,Biersack E. Traffic to protocol re-verse engineering [C]/ / Proceedings of the 2009 IEEESymposium on Computational Intelligence in Security andDefense Applications. Ottawa,ON:IEEE,2009:257-264.
8Shevertalov M,Mancoridis S. A reverse engineering toolfor extracting protocols of networked applications[C] / /Proceedings of the 14th Working Conference on ReverseEngineering. Vancouver,BC:IEEE,2007:229-238.
9赵臻,吴宁,宋盼盼.基于多特征融合的句子语义相似度计算[J].计算机工程,2012,38(1):171-173. 被引量：18
10范云杰,刘怀亮.基于维基百科的中文短文本分类研究[J].现代图书情报技术,2012(3):47-52. 被引量：34

引证文献2

1孟凡治,刘渊,张春瑞,李桐.基于状态相关字段识别的未知二进制协议状态机逆向方法[J].电讯技术,2015,55(4):372-378. 被引量：2
2黄贤英,李沁东,刘英涛.结合词性的短文本相似度算法及其在文本分类中的应用[J].电讯技术,2017,57(1):78-82. 被引量：11

二级引证文献13

1罗建桢,余顺争,蔡君.基于最大似然概率的协议关键词长度确定方法[J].通信学报,2016,37(6):119-128. 被引量：6
2徐承亮,陈三龙.基于KNN算法的MRR数据分类研究[J].移动通信,2017,41(4):69-73. 被引量：2
3邬明强,张奎.结合TFIDF方法与Skip-gram模型的文本分类方法研究[J].电子技术与软件工程,2018(6):162-163. 被引量：1
4夏冰,李宝安,吕学强.综合词位置和语义信息的专利文本相似度计算[J].计算机工程与设计,2018,39(10):3087-3091. 被引量：10
5文武,李培强.基于K中心点和粗糙集的KNN分类算法[J].计算机工程与设计,2018,39(11):3389-3394. 被引量：8
6李高鹏,艾山·吾买尔.融合词性的维吾尔语文本分类研究[J].现代计算机,2019,25(17):21-25.
7李建锦,罗凡,李竣业,余向前,廖晓群.智能电网大数据去隐私化加密提取模型构建[J].电力信息与通信技术,2019,17(6):8-13. 被引量：13
8张璐,芦天亮,杜彦辉.基于WMF_LDA主题模型的文本相似度计算[J].计算机应用研究,2019,36(10):2916-2919. 被引量：10
9郭玉栋,左金平.大数据下数据库引文索引匹配误差检测仿真[J].计算机仿真,2020,37(2):394-397. 被引量：3
10卫欣玲.基于社区划分的现代文学作品个性化推荐算法[J].微型电脑应用,2021,37(12):198-201.

1杨森,孟晨,王成.基于最大分类间隔SVDD的电子装备状态监测模型研究[J].计算机测量与控制,2012,20(9):2335-2337.
2薛松,李雷.基于类内离散度的最小二乘支持向量机[J].计算机技术与发展,2015,25(4):71-74.
3李佳桢,潘志松,倪桂强,王琼.带野值的单类分类器在安全审计中的应用[J].计算机工程与应用,2008,44(21):154-156. 被引量：1
4潘志松,罗隽,倪桂强,胡谷雨.基于支持向量描述的人工免疫检测算法[J].哈尔滨工程大学学报,2006,27(B07):302-306. 被引量：2
5张永,张凤梅,谢福鼎,迟忠先.基于加权模糊支持向量描述的旋转机械故障分类[J].计算机科学,2009,36(7):182-184. 被引量：8
6唐发明,王仲东,陈绵云.支持向量机多类分类算法研究[J].控制与决策,2005,20(7):746-749. 被引量：89
7王永,程灿,戴明军,孙永.一种半监督支持向量机优化方法[J].工矿自动化,2010,36(12):47-50. 被引量：2
8李琼,董才林,陈增照,何秀玲.一种新的核化SVM多层分类方法[J].计算机工程与应用,2010,46(10):150-152. 被引量：3
9罗隽,潘志松,缪志敏,胡谷雨.基于写相关支持向量描述的入侵防护审计模型研究[J].通信学报,2007,28(7):8-14. 被引量：2
10应文豪,王士同.基于相似度差的大间隔快速学习模型[J].计算机科学,2013,40(8):239-244. 被引量：1

电讯技术

2014年第4期

浏览历史

内容加载中请稍等...

一种最大分类间隔SVDD的多类文本分类算法被引量：2

参考文献9

同被引文献18

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

一种最大分类间隔SVDD的多类文本分类算法 被引量：2

参考文献9

同被引文献18

引证文献2

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

一种最大分类间隔SVDD的多类文本分类算法被引量：2