期刊文献+

Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification 被引量:3

Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification
原文传递
导出
摘要 It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets. It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期376-391,共16页 计算机科学技术学报(英文版)
基金 supported in part by the Scientific and Technological Research Council of Turkey(TBíTAK)under Grant No.111E239
关键词 Naive Bayes semantic smoothing higher-order Naive Bayes higher-order smoothing text classification Naive Bayes, semantic smoothing, higher-order Naive Bayes, higher-order smoothing, text classification
  • 相关文献

参考文献38

  • 1Taskar B, Abbeel P, Koller D. Discriminative probabilistic models for relational data. In Proc. the 18th Conf. Uncer- tainty in Artificial Intelligence, August 2002, pp.485-492.
  • 2Chakrabarti S, Dom B, Indyk P. Enhanced hypertext catego- rization using hyperlinks. In Proc. International Conference on Management of Data, June 1998, pp.307-318.
  • 3Neville J, Jensen D. Iterative classification in relational data. In Proc. AAAI 2000 Workshop on Learning Statistical Mod- els from Relational Data, July 2000, pp.13-20.
  • 4Getoor L, Diehl C P. Link mining: A survey. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
  • 5Ganiz M C, Kanitkar S, Chuah M C, Pottenger W M. Detec- tion of interdomain routing anomalies based on higher-order path analysis. In Proc. the 6th IEEE International Confer- ence on Data Mining, December 2006, pp.874-879.
  • 6Ganiz M C, Lytkin N, Pottenger W M. Leveraging higher or- der dependencies between features for text classification. In Proc. European Conference on Machine Learning and Prin- ciples and Practice of Knowledge Discovery in Databases, September 2009, pp.375-390.
  • 7Ganiz M C, George C, Pottenger W M. Higher order Naive Bayes: A novel non-IID approach to text classification. IEEE Trans. Knowledge and Data Engineering, 2011, 23(7): 1022- 1034.
  • 8Lytkin N. Variance-based clustering methods and higher or- der data transformations and their applications [Ph.D. The- sis]. Rutgers University, N J, 2009.
  • 9Edwards A, Pottenger W M. Higher order Q-Learning. In Proc. IEEE Syrup. Adaptive Dynamic Programming and Re- inforcement Learning, April 2011, pp.128-134.
  • 10Deerwester S C, Dumais S T, Landauer T K et al. Indexing by latent semantic analysis. Journal of the American Society for information Science, 1990, 41(6): 391-407.

同被引文献13

  • 1伍建军,康耀红.潜在语义索引在文本分类中的应用[J].电脑与信息技术,2006,14(5):32-34. 被引量:3
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3Tirunillai S,Tellis G.Mining Marketing Meaning from Chatter:Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation[J].Journal of Marketing Research,2014,51(4):463-479.
  • 4Duan Jiangjiao,Zeng Jianping.Web Objectionable Text Content Detection Using Topic Modeling Technique[J].Expert Systems with Applications,2013,40(15):6094-6104.
  • 5Wiebe J,Wilson T,Cardie C.Annotating Expressions of Opinions and Emotions in Language[J].Language Resources and Evaluation,2005,39(2/3):164-210.
  • 6Deerwesler S,Dumais S T A.Indexing by Latent Semantic Analysis[J].Journal of the Society for Information Science,1990,41(6):391-407.
  • 7Ekman P,Friesen W V.The Repertoire of Nonverbal Behavior:Categories,Origins,Usage,and Coding[J].Semiotica,1969,1(1):49-98.
  • 8Brown G W,Cliff M T.Investor Sentiment and the Nearterm Stock Market[J].Journal of Empirical Finance,2004,11(1):1-27.
  • 9Liu Huan,Yu Lei.Toward Integrating Feature Selection Algorithms for Classification and Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(5):491-502.
  • 10Huang Cheng-Lung,Wang Chieh-Jen.A GA-based Feature Selection and Parameters Optimization for Support Vector Machines[J].Expert Systems with Applications,2006,31(2):231-240.

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部