期刊文献+

大语言模型生成与学者撰写论文摘要对比分析与识别——以情报学领域为例 被引量:2

Comparative Analysis and Identification Between Abstracts Generated by Large Language Models and Written by Scholars:A Case Study in the Field of Information Science
原文传递
导出
摘要 [目的/意义]探索大语言模型生成与学者撰写论文摘要的差异性,为学术论文的AIGC检测提供参考。[方法/过程]以情报学领域近三年高被引论文为例,首先使用Prompt根据论文标题生成对应摘要,构建研究数据集;其次,从图灵测试、词汇特征、词性特征、困惑度、主题一致性评测对两类文本的异同进行深入分析,揭示两类文本的差异性;最后,提出一种基于BERT-CNN的分类器对两类摘要文本进行识别。[结果/结论]当前人工无法较好识别大语言模型生成的论文摘要,其识别的精确率甚至低于随机猜测概率0.5;大语言模型生成的摘要相对较长,使用的词汇量也较多,两类摘要在名词占比上具有较大差异;学者撰写论文摘要拥有更高的困惑度,两类摘要的部分主题分布一致,在关注焦点、研究视角上存在较大差异;BERT-CNN的分类器具有最好的分类效果,超过主流的五种机器学习模型和三种深度学习模型。 [Purpose/Significance]Exploring the differences between abstracts generated by large language models and those written by scholars can provide references for AIGC detection in academic papers.[Method/Process]Taking highly cited papers in the field of information science in the past three years as examples,this paper used prompts to generate corresponding abstracts based on the paper titles and construct a research dataset.Subsequently,it conducted an in-depth analysis of the similarities and differences between the two types of texts through Turing tests,lexical features,part-of-speech features,perplexity,and thematic consistency,to reveal their differences.Finally,a BERT-CNN-based classifier was proposed to identify the two types of abstract texts.[Result/Conclusion]Currently,humans are unable to effectively identify abstracts generated by large language models,with an accuracy rate even lower than the random guessing probability of 0.5.Abstracts generated by large language models tend to be relatively longer and use a larger vocabulary,with significant differences in the proportion of nouns between the two types of abstracts.Abstracts written by scholars have higher perplexity,and while some thematic distributions of the two types of abstracts are consistent,there are significant differences in focus and research perspectives.The BERT-CNN classifier exhibits the best classification performance,surpassing five mainstream machine learning models and three deep learning models.
作者 王伟正 乔鸿 李肖俊 王静静 Wang Weizheng;Qiao Hong;Li Xiaojun;Wang Jingjing(Shandong Normal University Library,Jinan 250358;Shandong Normal University Business School,Jinan 250358;Digital Humanities Research Center,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250014;Institute of Information Science,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250014;School of Journalism and Communication,Shandong University,Jinan 250100)
出处 《图书情报工作》 北大核心 2025年第10期84-96,共13页 Library and Information Service
基金 国家自然科学基金青年项目“基于多源异构数据的科技关键节点及信息扩散机理研究”(项目编号:72304169)研究成果之一。
关键词 大语言模型 文本特征 文本分类 论文摘要 ChatGPT large language model textual features text classification academic abstract ChatGPT
  • 相关文献

参考文献10

二级参考文献93

共引文献386

同被引文献31

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部