基于小规模语料库和机器可读词典的二元分布语义获取被引量：2

Dual Distributional Semantic Knowledge Acqusition with Small Corpora and Machine Readable Dictionaries

下载PDF

导出

摘要本文提出了一种基于小规模语料库和机器可读词典 (MachineReadableDictionary ,MRD)的无指导的动词语义获取方法。该方法不需要使用有义项标注的语料库 ,而是使用从语料中获得的V +N搭配以及MRD中多义词定义的应用实例中获得的知识。使用两种方法解决数据稀疏问题 :首先 ,将词的相似性度量由直接共现扩展到共现词的共现 ,以共现聚类而不是共现词来计算词的相似度。其次 ,从MRD定义中获取名词的IS-A关系。通过这些方法 ,即使两个词不共享任何词 ,也可认为是相似的。实验表明 ,该方法可从很小规模的语料中获取知识 ,并在不限制词义的情况下达到 85 7%的正确排歧率。 This paper presents a system for unsupervised verb semantic knowledge acquisition using small corpus and a machine-readable dictionary (MRD). The system does not depend on sense-tagged corpus, but learns a set of typical usages listed in the MRD usage examples for each of the senses of a polysemous verb in the MRD definitions and uses verb-object co-occurrences acquired from the corpus. This paper concentrates on the problem of data sparseness in two ways. First, extending word similarity measures from direct co-occurrences to co-occurrences of co-occurred words, we compute the word similarities using not co-occurred words but co-occurred clusters. Second, we acquire IS-A relations of nouns from the MRD definitions. It is possible to cluster the nouns roughly by the identification of the IS-A relationship. By these methods, two words may be considered similar even if they do not share any word. Experiments show that this method can learn from very small training corpus and achieve over 85.7% correct disambiguation performance without a restriction of word's senses.

作者郝秀兰杨尔弘

机构地区太原师范学院网络中心山西大学计算机科学系

出处《中文信息学报》 CSCD 北大核心 2004年第6期23-29,共7页 Journal of Chinese Information Processing

基金山西省青年基金资助项目 (2 0 0 0 10 17)

关键词人工智能自然语言处理机器可读词典二元分布语义知识获取 artificial intelligence natural language processing MRD dual distribution semantic knowledge acquisition

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1Gale, W. K., K. W Church and D.Yarowsky (1993). A Method for Disambiguation Word Senses in a Large Corpus. Computer and the Humanities[J]. 1993, 26:415 - 439.
2Jeong-Mi Cho, Jungyun Seo, Gil Chang Kim. Dual distributional verb sense disambiguation with small corpora and machine readable dictionaries, ACL'99[C]. University of Maryland.
3Resnik, Philip Stuart. 1997. Selectional preference and sense disambiguation[A]. In: Proceedings of ANLP Workshop, Tagging Text with Lexical Semantics: Why, What, and How? [C].
4Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional Clustering Of English Words[M].

共引文献1

1陈能浩.珠江三角洲城市经济全球化程度测算及其发展战略研究[J].特区经济,2012(1):26-29. 被引量：2

同被引文献14

1方卫东,袁华,刘卫红.基于Web挖掘的领域本体自动学习[J].清华大学学报（自然科学版）,2005,45(S1):1729-1733. 被引量：31
2裴炳镇,陈晓明,胡熠,陆汝占.一种建立中文概念分类关系的新算法[J].计算机工程与应用,2004,40(36):18-21. 被引量：8
3黄昌宁,李涓子.词义排歧的一种语言模型[J].语言文字应用,2000(3):85-90. 被引量：16
4杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847. 被引量：243
5王素格,杨军玲,张武.自动获取汉语词语搭配[J].中文信息学报,2006,20(6):31-37. 被引量：14
6Donald Hindle. Noun Classification from PredicateArgument Structures[A]. In.. Proceedings of the 28th Annual Meeting of the ACL[C]. Pennsylvania: Association for Computational Linguistics, 1990, 268-275.
7Kathleen McKeown, Vasileios Hatzivassiloglou. Augmenting lexicons automatically: Clustering semantically related adjectives [A]. In: Proc. ARPA Human Language Technology Workshop 93 [C]. Princeton,NJ: ARPA Workshop on Human Language Technology, 1993, 272-277.
8Makato Iwayama, Takenobu Tokunaga. Cluster-based text categorization., a comparison of category search strategies[A]. In: Proceedings of SIGIR 95, 18th ACM International Conference on Research and Development in Information Retrieva[C]. New York, US..ACM Press, 1995, 273-281.
9Alcala, R., Casillas, J. Cord on, O., et al. Techniques for Learning and Tuning Fuzzy Rule-Based Systems for Linguistic Modeling and Their Application[A]. In: KNOWLEDGE-BASED SYSTEMS. Techniques and Applications Vol Ⅲ[C]. Europe: Academic Press, 1999, 889-941.
10Shlomo Argamon-Engelson, Ido Dagan. Committeebased sample selection for probabilistic classifiers[J].Journal of Artificial Intelligence Research, 1999, 11:335-360.

引证文献2

1王锦,陈群秀.汉语述语形容词机器词典机器学习词聚类研究[J].中文信息学报,2007,21(3):40-46. 被引量：3
2温春,石昭祥,张亮.中文领域本体概念层次获取方法对比研究[J].计算机应用研究,2009,26(8):2847-2850. 被引量：9

二级引证文献12

1吴楠,王庆林,刘禹.基于百科词条的领域本体关系抽取方法[J].中南大学学报（自然科学版）,2013,44(S2):347-350.
2朱虹,刘扬.词汇语义知识库的研究现状与发展趋势[J].情报学报,2008,27(6):870-877. 被引量：4
3朱虹,刘扬,俞士汶.汉语形容词的自动词义区分研究[J].中文信息学报,2009,23(6):19-25. 被引量：1
4季培培,鄢小燕,岑咏华,王凌燕.面向领域中文文本信息处理的术语语义层次获取研究[J].现代图书情报技术,2010(9):37-41. 被引量：7
5童波.Research on Extraction Method for Taxonomic Relation among Conceptions of Tea-science Field Ontology[J].Agricultural Science & Technology,2010,11(11):180-182.
6彭成,季培培.基于确定性退火的中文术语语义层次关联研究[J].计算机应用研究,2011,28(9):3235-3238. 被引量：5
7伍莹.基于“词群—词位变体”理论的现代汉语形容词语义网络构建——以“胖”类形容词为例[J].长江学术,2011(2):167-171. 被引量：1
8王红,高斯婷,潘振杰,肖志伟.基于NNV关联规则的非分类关系提取方法及其应用研究[J].计算机应用研究,2012,29(10):3665-3668. 被引量：7
9王长有,杨增春.一种基于句子结构特征的领域术语上下位关系获取方法[J].重庆邮电大学学报（自然科学版）,2014,26(3):385-389. 被引量：2
10叶圣俊,孙济庆,李楠.基于词素的中文术语语义关联研究[J].图书馆杂志,2017,36(1):80-87. 被引量：7

1张永奎,Cowie,JR.机器可读词典的快速查找技术[J].中文信息学报,1994,8(2):20-25. 被引量：2
2宋孜攀,陆汝占.机器可读词典中词汇属性信息的获取[J].计算机工程与应用,2009,45(5):138-140.
3樊玉俊,胡熠,陆汝占.基于机器可读词典的词汇知识抽取[J].计算机应用与软件,2008,25(6):8-10.
4李洛,黄达峰.P2P技术:信息检索发展趋势和一种MRD算法思想[J].福建电脑,2006,22(5):72-73.
5史婷婷,沈玉利.以本体为核心的图像检索研究[J].仲恺农业工程学院学报,2010,23(3):61-65.
6酷软我最大[J].电脑爱好者,2010(17):81-81.
7王娟,蒋兴浩,孙锬锋.视频摘要技术综述[J].中国图象图形学报,2014,19(12):1685-1695. 被引量：33
8陈佳,罗振声.一种基于语义搭配的汉语词义消歧方法[J].微计算机信息,2008,24(3):187-188. 被引量：1
9金聪,金枢炜.中国古代小说图像的语义获取方法[J].计算机与数字工程,2012,40(5):95-97. 被引量：1
10盛艳梅,周子力,马淑丽.基于CP加权的概念语义相似度算法[J].电子技术（上海）,2016,0(4):31-36. 被引量：1

中文信息学报

2004年第6期

浏览历史

内容加载中请稍等...

基于小规模语料库和机器可读词典的二元分布语义获取被引量：2

参考文献4

共引文献1

同被引文献14

引证文献2

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于小规模语料库和机器可读词典的二元分布语义获取 被引量：2

参考文献4

共引文献1

同被引文献14

引证文献2

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于小规模语料库和机器可读词典的二元分布语义获取被引量：2