基于组合特征的自训练隐式篇章关系的识别技术被引量：4

Implicit Discourse Relation Identification Based on Combined Features and Self-training Learning

下载PDF

导出

摘要信息抽取技术中,隐式篇章关系识别一直是研究难点.针对现有的有监督篇章关系识别方法中需要大量人工标注数据的缺点,提出了用自训练的策略实现半监督的隐式篇章关系的自动识别模型,尝试仅用少量标注样本,却获得和有监督方法相媲美的识别准确率,为未来实时大数据篇章关系识别提供了新的契机.此外,为了进一步提高识别的准确率,还针对词对特征、产生式特征、动词特征等9种篇章关系特征进行特征组合分析,构建候选篇章关系实例的知识表示,对模型进行优化.通过在Penn Discourse Treebank(PDTB2.0)语料库上的实验结果分析表明,该模型比传统有监督识别方法在准确率和F-score上分别提高了5.2%和13.5%. In the area of information extraction （IE）,it is a difficult task for implicit discourse relation identification. Aim to over- come the shortage of labeled data for the existing supervised discourse relation identification methods,a semi-supervised identification model based on self-training strategy was presented. Using only few labeled examples, the model achieved comparable performance with supervised methods,which provides a new opportunity for future real-time big-data identification task.Besides, we extracted 9 kinds of features,such as, word pair, production rule and verb etc. were extracted, and knowledge representation of candidate in- stances were constructed by serveral of them to optimize the model.Experimental results on Penn Discourse Treebank （PDTB2.0） showed that our model increases of accuracy and F-score by 5.2% and 13.5% respectively compared with traditional supervised method.

作者刘初陈锦秀

机构地区厦门大学信息科学与技术学院

出处《厦门大学学报（自然科学版）》 CAS CSCD 北大核心 2014年第2期182-189,共8页 Journal of Xiamen University：Natural Science

基金国家自然科学基金(60803078) 福建省自然科学基金(2010J01351) 教育部海外留学回国人员科研启动基金

关键词隐式篇章关系识别半监督学习自训练组合特征 implicit discourse relation identification semi-supervised learning self-training combined features

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Pitler E, Raghupathy M, Mehta H, et al.Easily identifiable discourse relations [R]. Philadelphia.. University of Penn- sylvania, 2008 : 884.
2Zhou Z M,Lan M,Niu Z Y,et al.The effects of discourse connectives prediction on implicit discourse relation rec- ognition[C]//Proceedings of the llth Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010 : 139-146.
3Pitler E,Louis A,Nenkova A.Automatic sense prediction for implicit discourse relations in text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudshurg, PA, USA: Association for Computational Linguistics, 2009 : 683-691.
4Lin Z, Kan M Y, Ng H T. Recognizingimpl.icit discourse relations in the Penn Discourse Trcebank [C] // Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:343-351.
5Hernault H, Bollegala D, Ishizuka M. A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA : Association for Computational Linguistics, 2010 : 399-409.
6Hernault H, Bollegala D, Ishizuka M.Semi-supervised dis- course relation classification with structural learning[C] //Computational Linguistics and Intelligent Text Pro- cessing.Berlin Heidelberg : Springer, 2011 : 340-352.
7Xu Y, Lan M, Lu Y, et al.Connective prediction using ma- chine learning for implicit discourse relation classification [C] // Neural Networks (IJCNN),the 2012 International Joint Conference on. New York : IEEE, 2012 : 1-8.
8Park J, Cardie C. Improving implicit discourse relation recognition through feature set optimization[C] // Pro- ceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA, USA: Association for Computational IAnguistics, 2012:108-112.
9Klein D, Manning C D.Accurate unlexicalized parsing[C] //Proceedings of the 41st Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA : Asso- ciation for Computational Linguistics, 2003 : 423-430.

同被引文献48

1赵卓翔,王轶彤,田家堂,周泽学.社会网络中基于标签传播的社区发现新算法[J].计算机研究与发展,2011,48(S3):8-15. 被引量：37
2吴青,刘三阳,郑巍.基于乘性规则的支持向量机[J].智能系统学报,2007,2(2):74-77. 被引量：3
3HUANG H, CHEN H. Chinese discourse relation recognition [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai: Asian Federation of Natural Language Processing. 2011: 1442-1446.
4PRASAD R, HUSAIN S,SHARMA D M, et al. Towards an annotated corpus of discourse relations in Hindi [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008: 73-80.
5AL-SAIF A, MARKERT K. The Leeds Arabic discourse treebank: annotating discourse connectives for Arabic [C]// LREC 2010: Proceedings of the 2010 International Conference on Language Resources and Evaluation. Valletta: European Language Resources Association, 2010: 2046-2053.
6ZEYREK D, WEBBER B. A discourse resource for Turkish: annotating discourse connectives in the METU corpus [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008:65-72.
7CHANG C, LIN C. LIBSVM-A library for support vector machines [EB/OL]. [2014-12-20]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html#download.
8Belkin M, Niyogi P, Sindhwani V. Manifold regularization : A geometric framework for learning from labeled and unlabeled examples [ J ]. Jour- nal of Machine Learning Research ,2006,7 ( 11 ) :2399 - 2434.
9He Xiaofei, Yan Shucheng, Hu Yuxiao. Face reeo-gnition using lapla- cianfaces[J]. IEEE Transaction on Pattern Analysis and Machine In- telligence,2005,27 (3) :328 - 340.
10Zhu Xiaojin, Ghahramani Z. Learning from labeled and unlabeled data with label propagation, C-MU-CALD-02-107 [ R ]. Pittsburghers : Carne- gie Mellon University,2002.

引证文献4

1周建成,吴铤,王荣波,常若愚.基于LIBSVM的“就是”句句间关系判别方法[J].计算机应用,2015,35(7):1950-1954.
2尚耐丽,王骁力,沈鹍霄,卢玉领,马晓普,兰义华.半监督分类方法的研究[J].计算机应用与软件,2015,32(11):162-166. 被引量：4
3郑江龙,陈锦秀.基于混合树结构神经网络的隐式篇章关系识别[J].厦门大学学报（自然科学版）,2017,56(4):576-583.
4王凯,杨枢,张钰.一种面向非平衡生物医学数据的自训练半监督方法[J].大庆师范学院学报,2017,37(6):75-79.

二级引证文献4

1古楠楠,孙湘南,刘伟,李路云.基于自步学习与稀疏自表达的半监督分类方法[J].系统科学与数学,2020,40(1):191-208. 被引量：4
2杜文,张昴,段勇,吴晖.黄河一张图综述[J].水利信息化,2020(3):1-5. 被引量：4
3鲍璐,郭夕惠.基于机器学习和统计学习的P300识别问题研究[J].数学的实践与认识,2021,51(23):188-196.
4刘学文,王继奎,杨正国,易纪海,李冰,聂飞平.近亲结点图编辑的Self-Training算法[J].计算机工程与应用,2022,58(14):144-152. 被引量：1

1李生,孔芳,周国栋.基于PDTB体系的隐式篇章关系识别[J].中文信息学报,2016,30(4):81-89. 被引量：4
2严为绒,洪宇,朱珊珊,车婷婷,姚建民,朱巧明.基于语义场景的隐式篇章关系检测方法[J].山东大学学报（理学版）,2014,49(11):59-67.
3丁浩奇,李保利.基于条件随机场的隐式篇章关系识别[J].自动化应用,2016(9):56-58.
4周小佩,洪宇,车婷婷,姚建民,朱巧明.基于平行论元的隐式篇章关系推理研究[J].计算机应用与软件,2012,29(9):57-61. 被引量：1
5孙静,李艳翠,周国栋,冯文贺.汉语隐式篇章关系识别[J].北京大学学报（自然科学版）,2014,50(1):111-117. 被引量：16
6周小佩,洪宇,车婷婷,姚建民,朱巧明.一种无指导的隐式篇章关系推理方法研究[J].中文信息学报,2013,27(2):17-25. 被引量：5
7洪宇,朱珊珊,丁思远,姚建民,朱巧明,周国栋.基于外联关系的隐式篇章关系推理[J].计算机研究与发展,2015,52(11):2476-2487.
8洪宇,严为绒,车婷婷,梁颖红,姚建民,朱巧明,周国栋.平行推理机制：一种隐式篇章关系检测方法[J].软件学报,2014,25(11):2528-2555. 被引量：2
9朱珊珊,洪宇,丁思远,姚建民,朱巧明.面向不平衡数据的隐式篇章关系分类方法研究[J].中文信息学报,2015,29(6):110-118. 被引量：1
10徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035. 被引量：16

厦门大学学报（自然科学版）

2014年第2期

浏览历史

内容加载中请稍等...

基于组合特征的自训练隐式篇章关系的识别技术被引量：4

参考文献9

同被引文献48

引证文献4

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于组合特征的自训练隐式篇章关系的识别技术 被引量：4

参考文献9

同被引文献48

引证文献4

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于组合特征的自训练隐式篇章关系的识别技术被引量：4