基于排序学习的文本概念标注方法研究被引量：2

Learning to Rank Concept Annotation for Text

下载PDF

导出

摘要提出一种基于排序学习的方法 CRM(concept ranking model),来实现文档的维基百科概念自动标注。首先人工对一定规模的文档进行概念标注,建立训练集合,然后利用排序学习算法在多项特征上得到对概念排序的模型,利用这个概念的排序模型对任意文档进行概念标注。实验表明,相对于传统的文档概念标注方法,此方法在各类指标上都有相当大的提高,标注结果更加接近人类的概念标注。 This paper proposed an automatic text annotation method （CRM, concept ranking model） based onlearning to ranking model. Firstly the authors built a training set of concept annotation manualy, and then used the Ranking SVM algorithm to generate concept ranking model, finally the concept ranking model was used to generateconcept annotation for any texts. Experiments show that proposed method has a significant improvement in various indicators compared to traditional annotation methods, and concept annotation results is closer to humanannotation.

作者涂新辉何婷婷李芳王建文

机构地区华中师范大学计算机学院国家语言资源监测与研究中心网络媒体语言分中心

出处《北京大学学报（自然科学版）》 EI CAS CSCD 北大核心 2013年第1期153-158,共6页 Acta Scientiarum Naturalium Universitatis Pekinensis

基金国家自然科学基金(90920005 61003192)资助

关键词概念标注排序学习维基百科显示语义分析 concept annotation learning to ranking Wikipedia explicit semantic analysis

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1Loh S, Wives L K. Concept-based text mining // Handbook of research on text and Web mining technologies. Hershey: Information Science Refer- ence, 2009:346-358.
2Mihalcea R, Csomai A. Wikify!: linking documents toencyclopedic knowledge // Proceedings of the six- teenth ACM conference on Conference on information and knowledge management (CIKM'07). Lisbon, Portugal, 2007:233-242.
3Milne D, Witten I H. Learning to link with Wikipedia //Proceedings of the 17th ACM conference on Infor- mation and Knowledge Management (CIKM'08). Napa Valley, 2008:509-518.
4Maron M E. On indexing, retrieval and the meaning of about. Journal of the American Society for Infor- mation Science, 1977, 28(1): 38-43.
5Medelyan O, Witten I H, Milne D. Topic indexing with Wikipedia // Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI'08). Chicago, 2008:19-24.
6Ferragina P, Scaiella U. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)//Proceedings of the 19th ACM international conference on Information and knowledge mana- gement (CIKM'10). Toronto, 2010:1625 1628.
7Kulkarni S, Singh A, Ramakrishnan G, et al. Collective annotation of Wikipedia entities in web text // Proceedings" of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). Paris, 2009:457-466.
8Gabrilovich E, Markovitch S. Wikipedia-based semantic interpretation for natural language pro- cessing. Journal of Artificial Intelligence Research, 2009, 34(1): 443-498.
9Li Hang. Learning to rank for information retrieval and natural language processing. San Rafael: Morgan & Claypool Publishers, 2011.
10Cao Yunbo, Xu Jun, Liu Tieyan, et al. Adapting ranking SVM to document retrieval //Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). Seattle, 2006:186-193.

同被引文献27

1Hahn V.Turning Informal Thesauri into Formal Ontologies:A Feasibility Study on Biomedical Knowledge Re-use [J].Comparative and Functional Genomics,2003,4(1):94-97.
2Missikoff M,Navigli R,Velardi P.Integrated Approach to Web Ontology Learning and Engineering [J].Computer,2002,35(11):60-63.
3Shamsfard M.,Barforoush A.Learning Ontologies from Natural Language Texts [J].International Journal of Human Computer Studies,2004,60(1):17-63.
4Damerau F J.Evaluating Domain-oriented Muiti-Word Terms from Text [J].Information Processing and Management,2006,29(4):433-447.
5Cohen J D.Highlights:Language-and Domain-Independent Automatic Indexing Terms for Abstracting [J].Journal of the American Society Information Science,2007,46(3):162-174.
6ICTCLAS [EB/OL].[2013-07-20].http://ictclas.nlpir.org/.
7程波波,张友华,李绍稳,辜丽川,朱利君.茶学本体学习中的概念抽取[J].计算机系统应用,2010,19(7):111-114. 被引量：2
8丁晟春,傅柱.基于航天叙词表的领域本体半自动化构建研究[J].情报理论与实践,2011,34(11):113-116. 被引量：18
9常春,赖院根.数字环境下通用概念获取方法[J].图书情报工作,2011,55(22):22-25. 被引量：9
10聂华,朱玲.网络级发现服务——通向深度整合与便捷获取的路径[J].大学图书馆学报,2011,29(6):5-10. 被引量：56

引证文献2

1余凡,楼雯.领域概念的三层递进筛选方法研究[J].现代图书情报技术,2015(4):26-33. 被引量：2
2曾建勋,丁遒劲.基于语义的国家科技信息发现服务体系研究[J].中国图书馆学报,2017,43(4):51-62. 被引量：28

二级引证文献30

1张辉,郝程乾,黄振远,葛胤池.基于语义表征的科技资源建模与发现方法[J].中国基础科学,2020(5):54-58.
2化柏林,陈丹蕾,汪大锟.数据中台在科技情报中的应用[J].情报学进展,2022(1):265-314. 被引量：1
3张晗,毕强,丁梦晓,李洁,牟冬梅.基于多特征耦合的数字图书馆知识发现服务优化研究[J].图书情报工作,2019,63(3):14-20. 被引量：9
4韦艳芳,魏东原,沈辅成,李珍,彭庆昌,朱妍.流程驱动、情景敏感的资源发现系统的构建思路与实现方法[J].图书情报工作,2019,63(5):129-137. 被引量：5
5赵瑞雪,张洁,寇远涛,鲜国建.农业科技信息资源一站式发现服务研究[J].数字图书馆论坛,2017(11):2-8. 被引量：2
6毕强,闫晶,李洁.大数据时代数字图书馆服务转型面临的新形势与新要求[J].情报理论与实践,2017,40(12):12-16. 被引量：52
7丁遒劲,马袁燕,李勃慧.多来源元数据集成中的组织管理框架研究[J].数字图书馆论坛,2017(12):58-62. 被引量：6
8管立.移动图书馆场景服务研究[J].图书馆学刊,2017,39(12):98-101. 被引量：2
9张晗,毕强,许鹏程.图书馆知识发现系统与用户交互模型构建[J].情报资料工作,2018,39(4):15-23. 被引量：9
10丁梦晓,毕强,张晗,李洁.数字图书馆知识发现系统用户交互优化研究[J].情报资料工作,2018,39(4):32-38. 被引量：6

1周祥,周向东,周浩峰,王智慧,汪卫,施伯乐.利用频繁模式挖掘进行图像标注[J].计算机科学,2007,34(3):170-173.
2张亮,屈振新,丁菘,唐胜群.一种基于加权领域本体的语义检索方法[J].计算机科学,2010,37(7):165-168. 被引量：5
3王涛,贾媚.一种文本信息抽取技术的研究[J].计算机与网络,2007,33(9):49-51.
4伊怀彬,王加俊.一种通过去噪来提高图像标注性能的方法[J].苏州大学学报（自然科学版）,2009,25(2):45-51.
5林鸿飞,姚天顺.基于潜在语义索引的文本浏览机制[J].中文信息学报,2000,14(5):49-56. 被引量：29
6杨艳萍,谭庆平.一种有效的服务资源自动语义标注方法[J].计算机研究与发展,2007,44(1):37-43. 被引量：3
7陈姣姣,张晓如,周永梅.基于本体的监控视频语义事件探测[J].计算机应用研究,2012,29(1):112-115. 被引量：1
8杨艳萍,谭庆平.Web服务自动语义标注的本体定位方法研究[J].计算机工程与科学,2008,30(4):148-151.
9赵晶,林鸿飞,卢冶.可视化文本分类树浏览机制[J].小型微型计算机系统,2006,27(3):524-528. 被引量：1
10郭庆琳,樊孝忠.知识信息搜索和获取技术的研究[J].北京工业大学学报,2003,29(4):500-503.

北京大学学报（自然科学版）

2013年第1期

浏览历史

内容加载中请稍等...

基于排序学习的文本概念标注方法研究被引量：2

参考文献10

同被引文献27

引证文献2

二级引证文献30

相关作者

相关机构

相关主题

浏览历史

基于排序学习的文本概念标注方法研究 被引量：2

参考文献10

同被引文献27

引证文献2

二级引证文献30

相关作者

相关机构

相关主题

浏览历史

基于排序学习的文本概念标注方法研究被引量：2