摘要
深入分析知识研究的基本知识单元,对知识单元的概念、特性、载体及抽取过程做详细阐述,提出知识计量研究中的知识单元的定义与特性,对知识单元的独立性、组合性、链接性、多维性、外显性、可测性进行详细说明。根据知识单元特性以及中文文献特点,提出一种基于词长和位置考虑的TF/IDF多因素改进算法,以《半导体光电》期刊1999—2006年数据为实例,对比分析了传统TF/IDF特征词抽取方法与改进后特征词抽取算法,分析结果表明,基于词长和位置的TF/IDF多因素改进算法显著提高了知识单元抽取效率和准确性。
Based on the depth analysis of the basic knowledge unit in knowledge research, the concept, the characteristics, the carries and the extraction process of knowledge unit are expounded in detail. Explored definition and properties of knowledge unit in knowledge metrics study, explained the independence, combination, links, muhidimension, explicit and measurable. Based on the characteristics of knowledge unit and the specialization of Chinese documents, we proposed and improved TF/IDF multifactor algorithm based on the consideration of both length of word and word location. Then took the data from 1999-2006 in Semiconductor Optoelectronies journal as an example, Analyze the differences between traditional method and the improved algorithm. The results showed that the algorithm we proposed significantly increased the efficiency and precision in knowledge unit extraction.
出处
《情报学报》
CSSCI
北大核心
2011年第10期1037-1043,共7页
Journal of the China Society for Scientific and Technical Information
基金
本文得到国家社会科学基金(08BTQ025)的资助.