期刊文献+

一种基于复杂网络模型的作者身份识别方法 被引量:9

An Authorship Attribution Algorithm Based on Complex Network
原文传递
导出
摘要 [目的 /意义]作者身份识别是语言文体学的重要研究方向,利用文本特征的身份识别也是文本挖掘的重要任务。在开放和虚拟网络环境下海量信息的作者身份或发布者的识别难题和传统作者身份识别方法在处理效率和成本等方面存在的问题有待解决。[方法 /过程]将复杂网络理论引入该研究领域,在利用传统文体学特征识别作者身份方法的基础上结合文本词共现网络模型及其指标特征改进相关算法,使用文本文体学特征和文本网络模型度量指标构建作者风格特征集合,通过计算文本间风格相似度进行作者识别。[结果 /结论]基于复杂网络模型的作者身份识别方法可以有效的利用作者风格特征,提高识别的精度,与其他算法的对比试验表明其识别结果的准确性更高。 [ Purpose/significance ] Authorship analysis by means of textual features is an important task in text mining and linguistic studies. Tosolve the problem of low efficiency and high costs in authorship attribution using traditional method, complex networks theory has been employedto tackle this disputed problem. [Method/process ] In this paper, some measurable quantities of word co - occurrence complex network of text has been for used for authorship characterization. Based on stylistics and the network features, the approach is defined for authorship identification bycomputing theauthors'stylefeatures similarity. [Result/condusion]The authorship attribution algorithm based on complex network can use authors style featureseffeetively. The experimental results show high accuracy rate in authorship attribution and prove the validity of this method.
出处 《图书情报工作》 CSSCI 北大核心 2015年第18期102-107,共6页 Library and Information Service
基金 国家自然科学基金"基于复杂网络的中文文本语义相似度研究"(项目编号:71373200)研究成果之一
关键词 作者识别 文本分类 复杂网络 特征提取 词共现 文体学 authorship attribution text classification complex network feature selection word co-occurrencelinguistic
  • 相关文献

参考文献24

  • 1Bozkurt I N,Baghoglu O,Uyar E.Authorship attribution[C]//22nd International Symposium on Computer & Information Sciences.Piscataway:IEEE,2007:1-5.
  • 2Collobert R,Weston J.A unified architecture for natural language processing: Deep neural networks with multitask learning[C]// Proceedings of the 25th International Conference on Machine Learning.New York:ACM,2008:160-167.
  • 3Stamatatos E,Fakotakis N,Kokkinakis G.Computer-based authorship attribution without lexical measures[J].Computers & the Humanities,2001,35(2):193-214.
  • 4Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(2):1-47.
  • 5Klein A,Riazanov A,Hindle M M,et al.Benchmarking infrastructure for mutation text mining[J].Journal of Biomedical Semantics,2014,5(1):11.
  • 6Liu Wenyin,Quan Xiaojun,Feng Min,et al.A short text modeling method combining semantic and statistical information[J].Information Sciences,2010,180:4031-4041.
  • 7Neme A,Pulido J R G,Abril Mu?oz,et al.Stylistics analysis and authorship attribution algorithms based on self-organizing maps[J].Neurocomputing,2015,147:147-159.
  • 8Parasher S V.Indian English: Certain grammatical,lexical and stylistic features[J].English World-Wide,1983,4(1):27-42.
  • 9Savoy J.Authorship attribution based on a probabilistic topic model[J].Information Processing & Management,2013,49(1):341-354.
  • 10吕英杰,范静,刘景方.基于文体学的中文UGC作者身份识别研究[J].现代图书情报技术,2013(9):48-53. 被引量:12

二级参考文献101

共引文献208

同被引文献59

引证文献9

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部