期刊文献+

从Web获取部分整体关系语料的方法 被引量:4

A Method for Acquiring Corpus Rich in Part-Whole Relation from the Web
在线阅读 下载PDF
导出
摘要 部分整体关系获取是知识获取中的重要组成部分。Web逐步成为知识获取的重要资源之一。搜索引擎是从Web中获取部分整体关系知识的有效手段之一,我们将Web中包含部分整体关系的检索结果集合称为部分整体关系语料。由于目前主流搜索引擎尚不支持语义搜索,如何构造有效的查询以得到富含部分整体关系的语料,从而进一步获取部分整体关系,就成为一个重要的问题。该文提出了一种新的查询构造方法,目的在于从Web中获取部分整体关系语料。该方法能够构造基于语境词的查询,进而利用现有的搜索引擎从Web中获取部分整体关系语料。该方法在两个方面与人工构造查询方法和基于语料库查询构造查询方法所获取的语料进行对比,其一是语料中含有部分整体关系的语句数量;二是从语料中进一步获取部分整体关系的难易程度。实验结果表明,该方法远远优于后两者。 The acquisition of part-whole relations is an important problem of knowledge acquisition.The Web becomes an important resource of knowledge acquisition.Search engine is an effective way to mining knowledge from the Web.The retrieval results containing part-whole relations are called corpus rich in part-whole relation in our paper.Because the current search engine is not semantic-based retrieval,it becomes a challenging issue to construct an effective query to retrieve documents containing part-whole relation from web.This paper gives a novel method of constructing query for acquiring corpus rich in part-whole relations from the Web.We use search engine and query string with context words related to part-whole relation to acquire corpus rich in knowledge about part-whole relation.By contrasting the method of manually constructing query and the method of constructing query based on corpus on the number of retrieve documents containing part-whole relation and the difficult degree expected from the retrieve documents,the result shows that our method was superior to others.
出处 《中文信息学报》 CSCD 北大核心 2011年第5期17-23,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60773059)
关键词 部分整体关系获取 语料获取 查询构造 part-whole relation acquisition corpus acquisition query formulation
  • 相关文献

参考文献11

  • 1Marti A. Hearst, Automatic Acquisition of hyponyms from large text corpora[C]//Proceedings of the 14th International Conference on Computational Linguistics, 1992, 539-545.
  • 2Morton E. Winston, Roger Chaffin, and Douglas Hermann. A taxonomy of part-whole relations[J]. Cognitive Science, 1987, 417-444.
  • 3张森,王斌.Web检索查询意图分类技术综述[J].中文信息学报,2008,22(4):75-82. 被引量:16
  • 4袁毓林.用同义表达形式来扩充信息检索的查询语句例证研究——对于一种基于语义的搜索方式的若干设想[J].语言文字应用,2008(2):123-131. 被引量:7
  • 5Christiane Fellbaum. WordNet.. An Electronic Lexical Database[M]. 1998. MIT Press.
  • 6知网:http://www.keenage.com/[DB/OL].
  • 7Matthew Berland and Eugene Charniak. Finding Parts in Very Large Corpora[C]//Proceedings of the the 37th Annual Meeting of the Association for Computational Linguistics. 1999.
  • 8Roxana Girju, Adriana Badulescu and Dan Moldovan, Automatic Discovery of Part-Whole Relations [J]. Computational Linguistics, 2006, 32(1): 83-135.
  • 9Xinyu Cao, Cungen Cao, Shi Wang and Han Lu. Extracting Part-Whole Relations from Unstructured Chinese Corpus[C]//Proceedings 4th International Conference on Natural Computation and 5th International Conference on Fuzzy Systems and Knowledge Discovery. 2008.
  • 10Robert Van Hage Willem, Hap Kolb and Guus Schreiber. A method for learning part-whole relations [C]//Proceedings of the 5th Int. Semantic Web Conf. , 2006: 723-736.

二级参考文献32

  • 1[1]Miller G A, et al. Introduction to WordNet:an on-line lexical database, International Journal of Lexicography, 1990,3(4) :235 - 312
  • 2[2]Rila Mandala,Takenobu Tokunaga,Hozumi Tanaka,Combining multiple evidence from different types of thesaurus for query expansion,SIGIR, 1999:191 - 197
  • 3[3]Voorhees E M, Harman D K,The sixth Test REtrieval Conferenee(TREC-6) ,Gaithersburg,NIST, 1998
  • 4[4]Salton G, The SMART retrieval system-experiments in automatic document processing, Prentice Hall, 1971:115 -411
  • 5[5]http: ∥ morph. ldc. upenn. edu/Projects/Chinese
  • 6[6]Gao J F, Nie J Y, Zhang J, et al, Improving query translation for CLIR using statistical models, ACM SIGIR'01 ,New Orleans,2001:96- 104
  • 7[7]David Hull, Using statistical testing in the evaluation of retrieval performance, In Proc. of the 16th ACM/ SIGIR Conference, 1993: 329 - 338
  • 8毕玉德,崔杞鲜,刘扬.多语种词汇语义网建设中的几个问题[A].孙茂松,陈群秀.全国第八届计算语言学联合学术会议(JSCL-2005)论文集[C].2005:253-259.
  • 9陈沛.搜索的未来[A].孙茂松,陈群秀.全国第八届计算语言学联合学术会议(JSCL-2005)论文集[C].2005:24-33.
  • 10蒋严.形式语用学与显义学说-兼谈显谓与汉语配价研究的关系[A].刘丹青.语言学前沿与汉语研究[C].上海:上海教育出版社,2005:143-170.

共引文献43

同被引文献56

  • 1俞士汶,段慧明,朱学锋,张化瑞.综合型语言知识库的建设与利用[J].中文信息学报,2004,18(5):1-10. 被引量:31
  • 2吴友政,赵军,段湘煜,徐波.问答式检索技术及评测研究综述[J].中文信息学报,2005,19(3):1-13. 被引量:48
  • 3王海涛,曹存根,高颖.基于领域本体的半结构化文本知识自动获取方法的设计和实现[J].计算机学报,2005,28(12):2010-2018. 被引量:31
  • 4雪艳.关于用XML语言组织蒙古语语料库的设想[J].内蒙古大学学报(哲学社会科学版),2006,38(1):13-16. 被引量:4
  • 5Serger Brin. Extraction Patterns and Relations from the World Wide Web [C]//WebDB workshop at 6th Intl. Conf. on Extending Database Technology, 1998.
  • 6Keiji Shinzato, Kentaro Torisawa. Acquiring Hyponymy Relations from Web Documents [C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Compu tational Linguistics. Boston, MA, 2004:73 -80.
  • 7Kosuke Tokunaga, Jun'ichi Kazama, Kentaro Torisawa. Automatic Discovery of Attribute Words from Web Documents[C]//Proceedings of the 2 International Joint Conference on Natural Language Processing (IJCNL-05), Jeju Island, Korea,2005 : 106-118.
  • 8Harith Alani, Sanghee Kim, David E Millard, et al. Automatic Extraction of Knowledge from Web Documents[C]//Workshop on Human Language Techonology for the Semantic Web and Web Services, 2nd Int. Semantic Web Conf. Sanibel Island, Florida, USA, 2003.
  • 9Cindy Xide Lin, Bo Zhao, Tim Weninger, et al. Entity Relation Discovery from Web Tables and Links [C]//Proceeding of the 19th International Conference on World Wide Web. New York, USA, 2010.
  • 10Graeme Shanks, Elizabeth Tansley, Jasmina Nuredini, et al. Representing Part-Whole Relationships in Conceptual Modeling: An Empirical Evaluation[C]//Proceedings of 23rd International Conference on Information Systems: ICIS 2002. 2002 : 89-100.

引证文献4

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部