期刊文献+

基于组合特征的自训练隐式篇章关系的识别技术 被引量:4

Implicit Discourse Relation Identification Based on Combined Features and Self-training Learning
在线阅读 下载PDF
导出
摘要 信息抽取技术中,隐式篇章关系识别一直是研究难点.针对现有的有监督篇章关系识别方法中需要大量人工标注数据的缺点,提出了用自训练的策略实现半监督的隐式篇章关系的自动识别模型,尝试仅用少量标注样本,却获得和有监督方法相媲美的识别准确率,为未来实时大数据篇章关系识别提供了新的契机.此外,为了进一步提高识别的准确率,还针对词对特征、产生式特征、动词特征等9种篇章关系特征进行特征组合分析,构建候选篇章关系实例的知识表示,对模型进行优化.通过在Penn Discourse Treebank(PDTB2.0)语料库上的实验结果分析表明,该模型比传统有监督识别方法在准确率和F-score上分别提高了5.2%和13.5%. In the area of information extraction (IE),it is a difficult task for implicit discourse relation identification. Aim to over- come the shortage of labeled data for the existing supervised discourse relation identification methods,a semi-supervised identification model based on self-training strategy was presented. Using only few labeled examples, the model achieved comparable performance with supervised methods,which provides a new opportunity for future real-time big-data identification task.Besides, we extracted 9 kinds of features,such as, word pair, production rule and verb etc. were extracted, and knowledge representation of candidate in- stances were constructed by serveral of them to optimize the model.Experimental results on Penn Discourse Treebank (PDTB2.0) showed that our model increases of accuracy and F-score by 5.2% and 13.5% respectively compared with traditional supervised method.
作者 刘初 陈锦秀
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2014年第2期182-189,共8页 Journal of Xiamen University:Natural Science
基金 国家自然科学基金(60803078) 福建省自然科学基金(2010J01351) 教育部海外留学回国人员科研启动基金
关键词 隐式篇章关系识别 半监督学习 自训练 组合特征 implicit discourse relation identification semi-supervised learning self-training combined features
  • 相关文献

参考文献9

  • 1Pitler E, Raghupathy M, Mehta H, et al.Easily identifiable discourse relations [R]. Philadelphia.. University of Penn- sylvania, 2008 : 884.
  • 2Zhou Z M,Lan M,Niu Z Y,et al.The effects of discourse connectives prediction on implicit discourse relation rec- ognition[C]//Proceedings of the llth Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010 : 139-146.
  • 3Pitler E,Louis A,Nenkova A.Automatic sense prediction for implicit discourse relations in text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudshurg, PA, USA: Association for Computational Linguistics, 2009 : 683-691.
  • 4Lin Z, Kan M Y, Ng H T. Recognizingimpl.icit discourse relations in the Penn Discourse Trcebank [C] // Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009:343-351.
  • 5Hernault H, Bollegala D, Ishizuka M. A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA : Association for Computational Linguistics, 2010 : 399-409.
  • 6Hernault H, Bollegala D, Ishizuka M.Semi-supervised dis- course relation classification with structural learning[C] //Computational Linguistics and Intelligent Text Pro- cessing.Berlin Heidelberg : Springer, 2011 : 340-352.
  • 7Xu Y, Lan M, Lu Y, et al.Connective prediction using ma- chine learning for implicit discourse relation classification [C] // Neural Networks (IJCNN),the 2012 International Joint Conference on. New York : IEEE, 2012 : 1-8.
  • 8Park J, Cardie C. Improving implicit discourse relation recognition through feature set optimization[C] // Pro- ceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Stroudsburg, PA, USA: Association for Computational IAnguistics, 2012:108-112.
  • 9Klein D, Manning C D.Accurate unlexicalized parsing[C] //Proceedings of the 41st Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA : Asso- ciation for Computational Linguistics, 2003 : 423-430.

同被引文献48

  • 1赵卓翔,王轶彤,田家堂,周泽学.社会网络中基于标签传播的社区发现新算法[J].计算机研究与发展,2011,48(S3):8-15. 被引量:37
  • 2吴青,刘三阳,郑巍.基于乘性规则的支持向量机[J].智能系统学报,2007,2(2):74-77. 被引量:3
  • 3HUANG H, CHEN H. Chinese discourse relation recognition [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai: Asian Federation of Natural Language Processing. 2011: 1442-1446.
  • 4PRASAD R, HUSAIN S,SHARMA D M, et al. Towards an annotated corpus of discourse relations in Hindi [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008: 73-80.
  • 5AL-SAIF A, MARKERT K. The Leeds Arabic discourse treebank: annotating discourse connectives for Arabic [C]// LREC 2010: Proceedings of the 2010 International Conference on Language Resources and Evaluation. Valletta: European Language Resources Association, 2010: 2046-2053.
  • 6ZEYREK D, WEBBER B. A discourse resource for Turkish: annotating discourse connectives in the METU corpus [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008:65-72.
  • 7CHANG C, LIN C. LIBSVM-A library for support vector machines [EB/OL]. [2014-12-20]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html#download.
  • 8Belkin M, Niyogi P, Sindhwani V. Manifold regularization : A geometric framework for learning from labeled and unlabeled examples [ J ]. Jour- nal of Machine Learning Research ,2006,7 ( 11 ) :2399 - 2434.
  • 9He Xiaofei, Yan Shucheng, Hu Yuxiao. Face reeo-gnition using lapla- cianfaces[J]. IEEE Transaction on Pattern Analysis and Machine In- telligence,2005,27 (3) :328 - 340.
  • 10Zhu Xiaojin, Ghahramani Z. Learning from labeled and unlabeled data with label propagation, C-MU-CALD-02-107 [ R ]. Pittsburghers : Carne- gie Mellon University,2002.

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部