期刊文献+

基于MapReduce的术语权重计算方法研究 被引量:1

Research on Term Weighting Based on MapReduce
在线阅读 下载PDF
导出
摘要 术语识别在本体构建、词典构建等领域应用广泛,而术语权重计算是术语识别中的关键步骤。本文通过改进TF-IDF公式,将组成术语词条的长度作为权重因素之一,同时考虑术语在文档集中的领域相关性。整个过程基于MapReduce编程模型实现,在Hadoop云平台中以分布式方式计算候选领域术语的权重。实验结果表明,该方法不仅简化了术语权重计算的实施步骤,也提高了算法执行效率。 Term recognition is widely used in the ontology construction,dictionary construction and other fields.And term weighting is a key step in the term recognition.In this paper,several improvements have been made to TF-IDF algorithm,e.g.,the length of terms is considered in weighting,also with terms' correlations to documentation set.The candidate term weight is calculated in a distributed manner based on MapReduce on Hadoop.Experimental results show that the method proposed not only simplifies the steps of term weighting,but also improves the efficiency of the algorithm.
出处 《电信科学》 北大核心 2011年第11期62-65,共4页 Telecommunications Science
基金 国家自然科学基金资助项目(No.60872133) 北京市自然科学基金资助项目(No.4092015) 北京市教委科技发展计划资助项目(No.KM201010772023)
关键词 术语权重 TF-IDF MAPREDUCE 分布式 term weight TF-IDF MapReduce distributed
  • 相关文献

参考文献7

  • 1王强军 李芸 张普.信息技术领域术语提取的初步研究[J].自然语言处理,2002,:32-33.
  • 2mcard0Baeza-Yates.BerthierRibefio-Neto.M0rdem InformmionRetfievM.北京:机械工业出版社,2005.
  • 3Christina Hoffa, Gaurang Mehta, Timothy Freeman. On the use of cloud computing for scientific workflows, http://wenku.baidu. corn/view eea16c2a3169a4517623a305.html, 2008.
  • 4Jeffrey Dean,Sanjay Ghemawat. MapReduce: simplified data processing on large clusters.In:OSDI,2004.
  • 5孙广中,肖锋,熊曦.MapReduce模型的调度及容错机制研究[J].微电子学与计算机,2007,24(9):178-180. 被引量:26
  • 6许春玲,张广泉.分布式文件系统Hadoop HDFS与传统文件系统Linux FS的比较与分析[J].苏州大学学报(工科版),2010,30(4):5-9. 被引量:19
  • 7高志翔.一种基于TF-IDF算法的本体关联度算法[J].中国科技论文在线,2010,.

二级参考文献4

  • 1John Howard,Michael Kazar,Sherri Menees,et al.Scale and performance in a distributed file system[J].ACM Transactions on Computer Systems,1988,6(1).
  • 2Luiz A Barroso,Jeffrey Dean,Urs H¨olzle.Web search for a planet:the Google cluster architecture[J].IEEE Micro,2003,23(2).
  • 3Jeffrey Dean,Sanjay Ghemawat.Map Reduce:simplified data processing on large cluster[C].OSDI,2004
  • 4Sun Guangzhong,Fan Bin,Chen Guoliang,et al.Study on scheduling strategy for global computing application[C].PDCAT,2006:368-372

共引文献43

同被引文献10

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部