期刊文献+

基于多算法融合的文本抄袭检测的特征提取算法研究 被引量:3

Research on Feature Extraction Algorithm of Text Plagiarism Detection Based on Multi-algorithm Fusion
在线阅读 下载PDF
导出
摘要 特征提取是文本抄袭检测的重要环节,文本特征提取结果的质量将直接影响检测的可靠性.针对现有方法的不足,提出一种基于多算法融合的文本特征提取算法.该方法考虑到文本写作主题和写作风格对文本特征提取结果的影响,通过LDA主题模型、同义词林和GloVe&TF-IDF分别提取文本写作风格和文本主题的3个分特征向量,利用变分自编码器(VAE)进行混合和降维,提取出能够高度代表文本的融合特征向量.实验结果表明,该文本特征提取算法能够准确选择文本的特征集,解决了传统特征提取算法未考虑到文本写作风格和文本主题的缺点,检测的精确率达到了97.93%,相较于其他算法有所提高. As feature extraction is an important part of text plagiarism detection, the quality of text feature extraction results will directly affect the reliability of detection.Aiming at the shortcomings of existing methods, a text feature extraction algorithm based on multi algorithm fusion is proposed.Considering the influence of text writing topic and writing style on text feature extraction results, this method extracts three sub feature vectors of text writing style and text topic respectively through LDA topic model, synonym forest and GloVe & TF-IDF,and extracts the fusion feature vector that can highly represent the text by mixing and dimensionality reduction using variational self coder(VAE).The experimental results show that the text feature extraction algorithm can accurately select the text feature set, solve the shortcomings of the traditional feature extraction algorithm that does not consider the text writing style and text topic, and the detection precision reaches 97.93%,which is improved compared with other algorithms.
作者 陈滔 张庆国 何金波 周文竹 CHEN Tao;ZHANG Qingguo;HE Jinbo;ZHOU Wenzhu(School of Engineering,Anhui Agricultural University,Hefei 230036,China;School of Civil and Commercial Economic Law,Gansu University of Political Science and Law,Lanzhou 730070,China;Clinical College of Anhui Medical University,Hefei 230031,China;College of Resources and Environment,Anhui Agricultural University,Hefei 230036,China)
出处 《湖北民族大学学报(自然科学版)》 CAS 2022年第1期67-72,共6页 Journal of Hubei Minzu University:Natural Science Edition
基金 安徽农业大学“优才计划”科研发展资助项目(xszz202006) 安徽省学术和技术带头人及后备人选学术科研活动经费项目(2016H072)。
关键词 特征提取 抄袭检测 多算法融合 写作风格 文本主题 feature extraction plagiarism detection multi algorithm fusion writing style text theme
  • 相关文献

参考文献17

二级参考文献108

共引文献155

同被引文献36

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部