Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification 被引量：3

Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification

导出

摘要 It is known that latent semantic indexing （LSI） takes advantage of implicit higher-order （or latent） structure in the association of terms and documents. Higher-order relations in LSI capture ＂latent semantics＂. These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes （HONB）, which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing （HOS） for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets. It is known that latent semantic indexing （LSI） takes advantage of implicit higher-order （or latent） structure in the association of terms and documents. Higher-order relations in LSI capture ＂latent semantics＂. These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes （HONB）, which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing （HOS） for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets.

作者 Mitat Poyraz Zeynep Hilal Kilimci Murat Can Ganiz

机构地区 Department of Computer Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期376-391,共16页 计算机科学技术学报（英文版）

基金 supported in part by the Scientific and Technological Research Council of Turkey(TBíTAK)under Grant No.111E239

关键词 Naive Bayes semantic smoothing higher-order Naive Bayes higher-order smoothing text classification Naive Bayes, semantic smoothing, higher-order Naive Bayes, higher-order smoothing, text classification

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献38

1Taskar B, Abbeel P, Koller D. Discriminative probabilistic models for relational data. In Proc. the 18th Conf. Uncer- tainty in Artificial Intelligence, August 2002, pp.485-492.
2Chakrabarti S, Dom B, Indyk P. Enhanced hypertext catego- rization using hyperlinks. In Proc. International Conference on Management of Data, June 1998, pp.307-318.
3Neville J, Jensen D. Iterative classification in relational data. In Proc. AAAI 2000 Workshop on Learning Statistical Mod- els from Relational Data, July 2000, pp.13-20.
4Getoor L, Diehl C P. Link mining: A survey. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
5Ganiz M C, Kanitkar S, Chuah M C, Pottenger W M. Detec- tion of interdomain routing anomalies based on higher-order path analysis. In Proc. the 6th IEEE International Confer- ence on Data Mining, December 2006, pp.874-879.
6Ganiz M C, Lytkin N, Pottenger W M. Leveraging higher or- der dependencies between features for text classification. In Proc. European Conference on Machine Learning and Prin- ciples and Practice of Knowledge Discovery in Databases, September 2009, pp.375-390.
7Ganiz M C, George C, Pottenger W M. Higher order Naive Bayes: A novel non-IID approach to text classification. IEEE Trans. Knowledge and Data Engineering, 2011, 23(7): 1022- 1034.
8Lytkin N. Variance-based clustering methods and higher or- der data transformations and their applications [Ph.D. The- sis]. Rutgers University, N J, 2009.
9Edwards A, Pottenger W M. Higher order Q-Learning. In Proc. IEEE Syrup. Adaptive Dynamic Programming and Re- inforcement Learning, April 2011, pp.128-134.
10Deerwester S C, Dumais S T, Landauer T K et al. Indexing by latent semantic analysis. Journal of the American Society for information Science, 1990, 41(6): 391-407.

同被引文献13

1伍建军,康耀红.潜在语义索引在文本分类中的应用[J].电脑与信息技术,2006,14(5):32-34. 被引量：3
2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量：25
3Tirunillai S,Tellis G.Mining Marketing Meaning from Chatter:Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation[J].Journal of Marketing Research,2014,51(4):463-479.
4Duan Jiangjiao,Zeng Jianping.Web Objectionable Text Content Detection Using Topic Modeling Technique[J].Expert Systems with Applications,2013,40(15):6094-6104.
5Wiebe J,Wilson T,Cardie C.Annotating Expressions of Opinions and Emotions in Language[J].Language Resources and Evaluation,2005,39(2/3):164-210.
6Deerwesler S,Dumais S T A.Indexing by Latent Semantic Analysis[J].Journal of the Society for Information Science,1990,41(6):391-407.
7Ekman P,Friesen W V.The Repertoire of Nonverbal Behavior:Categories,Origins,Usage,and Coding[J].Semiotica,1969,1(1):49-98.
8Brown G W,Cliff M T.Investor Sentiment and the Nearterm Stock Market[J].Journal of Empirical Finance,2004,11(1):1-27.
9Liu Huan,Yu Lei.Toward Integrating Feature Selection Algorithms for Classification and Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(5):491-502.
10Huang Cheng-Lung,Wang Chieh-Jen.A GA-based Feature Selection and Parameters Optimization for Support Vector Machines[J].Expert Systems with Applications,2006,31(2):231-240.

引证文献3

1罗邦慧,曾剑平,段江娇,吴承荣.基于情感模型的文本意见分类方法[J].计算机工程,2015,41(5):175-179. 被引量：4
2徐光美,刘宏哲,张敬尊,王金华.用平滑方法改进多关系朴素贝叶斯分类[J].计算机工程与应用,2017,53(5):69-72. 被引量：9
3Zheng Zhang,Shu Zhou.Research on Feature Extraction Method of Social Network Text[J].Journal of New Media,2021,3(2):73-80. 被引量：2

二级引证文献15

1罗衎,王春峰,房振明.投资者情绪影响下资本资产定价的区制性研究——基于股票论坛发帖的情感分析[J].运筹与管理,2017,26(10):129-136. 被引量：2
2邓广彪,黄振功,岳晓光.基于Nesterov平滑的高阶路径朴素贝叶斯文本隐式分类研究[J].西南师范大学学报（自然科学版）,2018,43(7):107-112. 被引量：2
3孙子杰.基于朴素贝叶斯的新闻分类改进[J].电子制作,2018,26(22):37-39. 被引量：3
4任聪,李石君.面向网络新闻领域的评论情感极性分析[J].计算机工程与应用,2017,53(1):77-82. 被引量：7
5童威,黄启萍.加权朴素贝叶斯算法在消防检测中的应用[J].西安工程大学学报,2019,33(1):111-115. 被引量：4
6谭翔纬,程学军.基于信息粒数据重构的多关系数据聚类仿真[J].计算机仿真,2020,37(6):406-409. 被引量：1
7黄国鑫,朱守信,王夏晖,田梓,季国华,卢然,崔轩,陈茜.基于自然语言处理和机器学习的疑似土壤污染企业识别[J].环境工程学报,2020,14(11):3234-3242. 被引量：9
8田烨.面向SELL语料库的AI虚拟英语教育训练系统研究[J].微型电脑应用,2020,36(12):42-44. 被引量：1
9徐军.基于深度学习的财务异常数据智能分析方法研究[J].电子设计工程,2021,29(16):149-152. 被引量：9
10罗锦光,杨鸣坤,苏锦.基于GEP-NBC算法的Android恶意应用静态检测[J].信息与电脑,2021,33(16):62-66. 被引量：1

1陈琳,王箭.三种中文文本自动分类算法的比较和研究[J].计算机与现代化,2012(2):1-4. 被引量：6
2石义,钱步仁.基于内容与行为特征的反垃圾邮件系统[J].网络安全技术与应用,2009(4):20-21. 被引量：3
3黄冬梅,顾兢兢.基于Bayes算法的态势评估[J].舰船电子工程,2012,32(5):46-47. 被引量：1
4王花,古丽拉.阿东别克,吴守用.基于SVM的哈萨克语文本分类[J].计算机应用,2010,30(6):1676-1678. 被引量：2
5梁曌,陈思宇,梁小林,康欣.基于KNN和Bayes算法的组合分类器的垃圾评论识别研究[J].经济数学,2016,33(1):36-41. 被引量：1
6赵妮娜.改进的Salbayes算法在图像识别中的研究[J].计算机光盘软件与应用,2012,15(24):25-27. 被引量：1
7吴腾.传送网业务路由调度模式应用[J].中国科技博览,2013(29):535-535.
8周毅灵,耿增民.服装网页自动分类技术研究[J].北京服装学院学报（自然科学版）,2011,31(1):55-59. 被引量：2
9郄明川.STM—1的高阶通道[J].现代有线传输,1996(3):36-45. 被引量：1
10刘信杰,李艳,胡学钢.Naive Bayes算法在垃圾邮件过滤系统中的应用与改进[J].潍坊学院学报,2007,7(6):26-27. 被引量：2

Journal of Computer Science & Technology

2014年第3期

浏览历史

内容加载中请稍等...

Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification 被引量：3

参考文献38

同被引文献13

引证文献3

二级引证文献15

相关作者

相关机构

相关主题

浏览历史