摘要
在统计计算机学科专业7年毕业论文题目的基础上,从语义的角度对毕业论文题目的相似性进行了分析,为论文题目的查重及归类提供依据.首先,对毕业论文题目进行分词以得到特征词汇,进而计算特征词的权重构成论文题目的向量表示;其次,使用PLSA方法对得到的题目向量进行语义提取;最后,对语义向量进行相似度比较得出论文题目的相似性.实验结果表明,与传统的VSM方法相比,从语义角度对论文题目进行相似性比较更加合理、有效.
Based on seven years of statistics for graduation thesis topic,we analyzed the similarity of graduation thesis from semantic perspective,which could provid a basis for duplicate checking and classification of graduation thesis topic.Firstly,the word segmentation operation was adopted to obtain feature words and then the graduation thesis topic's vector representation was gotten by calculating the weight of feature words.Secondly,the topics' semantic were extracted by the PLSA method.Finally,comparison of the topics' semantic vector was implemented to test the similarity of graduation thesis topic.Experimental results show that semantic similarity was more reasonable and effective compared with the traditional VSM method.
出处
《延边大学学报(自然科学版)》
CAS
2013年第2期129-133,共5页
Journal of Yanbian University(Natural Science Edition)
基金
延边大学科技发展计划项目(延大科合字(2011)第43号)