期刊文献+

基于小规模语料库和机器可读词典的二元分布语义获取 被引量:2

Dual Distributional Semantic Knowledge Acqusition with Small Corpora and Machine Readable Dictionaries
在线阅读 下载PDF
导出
摘要 本文提出了一种基于小规模语料库和机器可读词典 (MachineReadableDictionary ,MRD)的无指导的动词语义获取方法。该方法不需要使用有义项标注的语料库 ,而是使用从语料中获得的V +N搭配以及MRD中多义词定义的应用实例中获得的知识。使用两种方法解决数据稀疏问题 :首先 ,将词的相似性度量由直接共现扩展到共现词的共现 ,以共现聚类而不是共现词来计算词的相似度。其次 ,从MRD定义中获取名词的IS-A关系。通过这些方法 ,即使两个词不共享任何词 ,也可认为是相似的。实验表明 ,该方法可从很小规模的语料中获取知识 ,并在不限制词义的情况下达到 85 7%的正确排歧率。 This paper presents a system for unsupervised verb semantic knowledge acquisition using small corpus and a machine-readable dictionary (MRD). The system does not depend on sense-tagged corpus, but learns a set of typical usages listed in the MRD usage examples for each of the senses of a polysemous verb in the MRD definitions and uses verb-object co-occurrences acquired from the corpus. This paper concentrates on the problem of data sparseness in two ways. First, extending word similarity measures from direct co-occurrences to co-occurrences of co-occurred words, we compute the word similarities using not co-occurred words but co-occurred clusters. Second, we acquire IS-A relations of nouns from the MRD definitions. It is possible to cluster the nouns roughly by the identification of the IS-A relationship. By these methods, two words may be considered similar even if they do not share any word. Experiments show that this method can learn from very small training corpus and achieve over 85.7% correct disambiguation performance without a restriction of word's senses.
出处 《中文信息学报》 CSCD 北大核心 2004年第6期23-29,共7页 Journal of Chinese Information Processing
基金 山西省青年基金资助项目 (2 0 0 0 10 17)
关键词 人工智能 自然语言处理 机器可读词典 二元分布 语义 知识获取 artificial intelligence natural language processing MRD dual distribution semantic knowledge acquisition
  • 相关文献

参考文献4

  • 1Gale, W. K., K. W Church and D.Yarowsky (1993). A Method for Disambiguation Word Senses in a Large Corpus. Computer and the Humanities[J]. 1993, 26:415 - 439.
  • 2Jeong-Mi Cho, Jungyun Seo, Gil Chang Kim. Dual distributional verb sense disambiguation with small corpora and machine readable dictionaries, ACL'99[C]. University of Maryland.
  • 3Resnik, Philip Stuart. 1997. Selectional preference and sense disambiguation[A]. In: Proceedings of ANLP Workshop, Tagging Text with Lexical Semantics: Why, What, and How? [C].
  • 4Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional Clustering Of English Words[M].

同被引文献14

  • 1方卫东,袁华,刘卫红.基于Web挖掘的领域本体自动学习[J].清华大学学报(自然科学版),2005,45(S1):1729-1733. 被引量:31
  • 2裴炳镇,陈晓明,胡熠,陆汝占.一种建立中文概念分类关系的新算法[J].计算机工程与应用,2004,40(36):18-21. 被引量:8
  • 3黄昌宁,李涓子.词义排歧的一种语言模型[J].语言文字应用,2000(3):85-90. 被引量:16
  • 4杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847. 被引量:243
  • 5王素格,杨军玲,张武.自动获取汉语词语搭配[J].中文信息学报,2006,20(6):31-37. 被引量:14
  • 6Donald Hindle. Noun Classification from PredicateArgument Structures[A]. In.. Proceedings of the 28th Annual Meeting of the ACL[C]. Pennsylvania: Association for Computational Linguistics, 1990, 268-275.
  • 7Kathleen McKeown, Vasileios Hatzivassiloglou. Augmenting lexicons automatically: Clustering semantically related adjectives [A]. In: Proc. ARPA Human Language Technology Workshop 93 [C]. Princeton,NJ: ARPA Workshop on Human Language Technology, 1993, 272-277.
  • 8Makato Iwayama, Takenobu Tokunaga. Cluster-based text categorization., a comparison of category search strategies[A]. In: Proceedings of SIGIR 95, 18th ACM International Conference on Research and Development in Information Retrieva[C]. New York, US..ACM Press, 1995, 273-281.
  • 9Alcala, R., Casillas, J. Cord on, O., et al. Techniques for Learning and Tuning Fuzzy Rule-Based Systems for Linguistic Modeling and Their Application[A]. In: KNOWLEDGE-BASED SYSTEMS. Techniques and Applications Vol Ⅲ[C]. Europe: Academic Press, 1999, 889-941.
  • 10Shlomo Argamon-Engelson, Ido Dagan. Committeebased sample selection for probabilistic classifiers[J].Journal of Artificial Intelligence Research, 1999, 11:335-360.

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部