摘要
对信息抽取技术的发展背景、概念进行了概述。详细介绍了信息抽取中研究的四个关键技术:命名实体识别、实体关系抽取、指代消解及事件探测。根据采用模型的不同,对信息抽取进行了分类介绍,分别指出了各类抽取方法的优点、缺点及研究难点。最后,对国内外在信息抽取领域中的研究现状及应用状况进行了分析,进一步说明了信息抽取技术的发展趋势。
出处
《福建电脑》
2010年第4期55-55,65,共2页
Journal of Fujian Computer
参考文献6
-
1Ping Zhong , Jinlin Chen. A Generalized Hidden Markov Model Approach for Web Information Extraction[C]. Proceedings of the 2006 IEEE/ WIC/ACM International Conference on Web Intelligence. December 18- 22, 2006: 709-718.
-
2Weiwei Sun , Hongzhan Li , Zhifang Sui, The integration of dependency relation classification and semantic role labeling using bilayer maximum entropy Markov models [C]. Proceedings of the Twelfth Conference on Computational Natural Language Learning. Manchester, United Kingdora August 16-17, 2008: 243-247.
-
3Xiao Li, Ye-Yi Wang, Alex Accro. Extracting structured information from user queries with semi-supervised conditional random fields [C]. Proceedings of the 32nd international ACM SIGIR. confcrcncc on Research and dcvclopmcnt in information retrieval. Boston, MA, USA. July 19-23, 2009: 572-579.
-
4Ching Hoi Andy Hong, Jesse Prabawa Gozali, Min-Yen Kan. FireCite: lightweight real-time reference string extraction from webpages [C]. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. Paris, France.2009: 189-198.
-
5ASHRAF Fafma,OZYER Tame,ALHAJJ Reda Employing Clustering Techniques for Automatic Information Extraction From HTML Documents [C]. IEEE transactiom on systems, man and cybernetics. Part C, Applicatious and reviews.2008,38(5): 660-673.
-
6张铭,银平,邓志鸿,杨冬青.SVM+BiHMM:基于统计方法的元数据抽取混合模型[J].软件学报,2008,19(2):358-368. 被引量:27
二级参考文献22
-
1Morville P, Rosenfeld L. Information Architecture for the World Wide Web: Designing Large-Scale Web Site. 3rd ed., Sebastopol: 0'Reilly&Associates, 2006.
-
2Chidlovskii B Wrapping web information providers by transducer induction. In: Racdt L, Flach P, eds. Proc of the 12th Int'l of European Conf. on Machine Learning (ECML 2001). LNCS 2167, Heidelberg: Springer-Verlag, 2001.61-72.
-
3Hitchcock S, Carr L, Jiao Z, Bergmark D, Hall W, Lagoze C, Harnad S. Developing services for open eprint archives: Globalisation, integration and the impact of links. In: Proc. of the 5th ACM Conf. on Digital Libraries (ACMDL 2000). New York: ACM Press, 2000. 143-151.
-
4Klink S, Dengel A, Kieninger T. Rule-Based document structure understanding with a fuzzy combination of layout and textual features. Int'l Journal on Document Analysis and Recognition, 2001,4( 1): 18-26.
-
5Kim J, Le DX, Thoma GR. Automated labeling algorithms for biomedical document images. In: Proc. of the 7th World Multiconference on Systemics, Cybernetics and Informatics. Orlando: ⅢS, 2003. 352-357.
-
6Zhang M, Yang DQ, Deng ZH, Feng Y, Wang WQ, Zhao PX, Wu S, Wang SA, Tang SW. PKUSpace: A collaborative platform for scientific researching. In: Liu WY, Shi YC, Li Q, eds. Proc of the Int'l Conf. of Web-based Learning (ICWL 2004). LNCS 3143, Heidelberg: Springer-Verlag, 2004. 120-127.
-
7Zhao PX, Zhang M, Yang DQ, Tang SW. Automatic extraction of metadata from digital documents. Computer Science, 2003, 30(10):217-204
-
8Bikel DM, Miller S, Schwartz R, Weischedel R. Nymble: A high performance learning name finder. In: Proc. of the 5th Conf. on Applied Natural Language Processing (ANLC'97). San Francisco: Morgan Kaufmann Publishers, 1997. 194-201.
-
9Seymore K, McCallum A, Rosenreid R. Learning hidden Markov model structure for information extraction. In: Califf ME, Freitag D, Kushmerick N, Muslea I, eds. Proc. of the AAAI'99 Workshop on Machine Learning for Information Extraction. Cambridge: MIT Press, 1999.37-42.
-
10Borkar VR, Deshmukh K, Sarawagi S. Automatic segmentation of text into structured records. In: Aref WG, ed. Proc. of the ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD 2001). New York: ACM Press, 2001. 175-186.
共引文献26
-
1郑继明,李瑞仙,蒲兴成.基于单状态HMM的音频分类方法研究[J].计算机应用,2009,29(2):392-394.
-
2李学勇,高国红,孙甲霞.基于互信息和K-means聚类的信息安全风险评估[J].河南师范大学学报(自然科学版),2011,39(2):152-155.
-
3李书明,陈云红.基于元数据的数字教育资源共享研究[J].中国电化教育,2009(2):106-108. 被引量:9
-
4党德鹏,孟真.基于支持向量机的信息安全风险评估[J].华中科技大学学报(自然科学版),2010,38(3):46-49. 被引量:38
-
5朱焱.万维网资源质量模式挖掘技术分析[J].计算机科学,2010,37(8):201-207. 被引量:2
-
6欧阳辉,禄乐滨,钱建立.基于C4.5的论文元数据抽取算法研究[J].计算机工程与设计,2010,31(16):3708-3711. 被引量:4
-
7佘俊,张学清.音乐命名实体识别方法[J].计算机应用,2010,30(11):2928-2931. 被引量:9
-
8高良才,汤帜,陶欣,房婧.一种自动发现、分割与标注引文元数据的方法[J].北京大学学报(自然科学版),2010,46(6):893-900. 被引量:2
-
9崔纪锋,张勇,邢春晓.元数据在数据库互操作中的应用[J].计算机科学与探索,2011,5(4):305-312. 被引量:7
-
10李荣,胡志军,郑家恒.基于遗传算法和隐马尔可夫模型的Web信息抽取的改进[J].计算机科学,2012,39(3):196-199. 被引量:8
同被引文献7
-
1邵嘉亮.Note Express的三大检索信息管理系统的分析与研究[J].硅谷,2014,7(11):52-52. 被引量:1
-
2黄春晓.基于NE文献管理软件的作业信息管理系统的设计与实现[J].农业图书情报学刊,2015,27(9):39-41. 被引量:1
-
3刘峰,张晓林.科学数据元数据标准述评及其通用化设计研究[J].现代图书情报技术,2015(12):3-12. 被引量:37
-
4王晓燕.一种基于数据集成工具的异构数据集成的分析与设计[J].办公自动化,2016,21(1):56-59. 被引量:1
-
5刘静.思维导图在知识管理中的应用分析[J].情报探索,2017(11):114-118. 被引量:2
-
6汪升华,唐国纯.基于HTML5的三维思维导图软件开发技术研究[J].软件工程,2017,20(10):4-7. 被引量:6
-
7杨志萍,杜瑾,李红培,王超,于蒙.个人知识管理工具综述[J].知识管理论坛,2013(3):9-15. 被引量:14