期刊文献+

基于Web信息挖掘的商业分析系统设计 被引量:2

Design of commercial analysis system based on web information mining
在线阅读 下载PDF
导出
摘要 随着Web上信息的快速增长,如何将潜藏于非结构化文档中的商业信息有效提取并分析服务于商业管理已成为新的研究热点。利用现有的Web信息挖掘技术,针对原始数据的混合异构性,提出信息块多主题分割的方法,在建立的商业领域实体名字典指导下对商业信息进行抽取和分类,并引入一种信用评级机制,构造了一个基于Web信息挖掘的商业信息分析系统(CABWIM),实验结果表明系统能有效地将散落在Web中潜在的商业信息抽取并加工整理,形成真正有实用价值的商业信息。 With the rapid growth of information on the web, how to efficiently extract and analyze latent commercial information from semi-structured pages to improve the commercial management has become a hot research issue. Using the existing technologies of web information mining, a new method that could segment topics from information block in order to settle the problem of complex data structure was proposed. With the direction of business entity name dictionary, this method extracted and classified business information. And also a mechanism of credit estimating to insure precision was introduced. Finally, a commercial information analysis system based on web information mining (CABWIM) was constructed. Experiments show this system can efficiently extract and tidy latent information to valuable commercial information.
出处 《计算机工程与设计》 CSCD 北大核心 2006年第1期62-65,共4页 Computer Engineering and Design
基金 江苏省高校自然科学基金项目(02KJB520013)
关键词 WEB信息挖掘 包装器 DOM树 web information mining wrapper DOM tree
  • 相关文献

参考文献8

  • 1Laender A, Ribeiro-Neto B, Silva A. A brief suervey of web data extraction tools[J]. SIGMOD Record, 2002,31(2):84-93.
  • 2Ning Gu, Guowen Wu.Extracting Web table information in cooperative learning activities based on abstract semantic model[C]. Proceedings of the Sixth International Conference on Computer Supported Cooperative Work, 2001.492-497.
  • 3Itai K, Takasu A, Adachi J.Information extraction from HTML pages and its integration[C]. Applications and the Interact Workshops, Proceedings, 2003.276-281.
  • 4Jiying Wang, Lochovsky F H. Data-rich section extraction from HTML pages[C].Web Information Systems Engineering, WISE 2002 Proceedings of the Third International Conference on,2002.313-322.
  • 5Muslea I, Minton S, Knolock C. Hierarchical wrapper induction for semistructured information sources [J]. Autonomous Agents and Multi-Agent Systems, 2001,4(1/2):93-114.
  • 6Sahuguet A, Azavant F. Building intelligens web applications using lightweight wrappers[J]. Data and Knowledge Engineering, 2001,36(3):286-316.
  • 7黄豫清,戚广志,张福炎.从WEB文档中构造半结构化信息的抽取器[J].软件学报,2000,11(1):73-78. 被引量:47
  • 8李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:102

二级参考文献18

  • 1Ham mar J,SIGMOD Record,1997年,26卷,2期,18页
  • 2Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74
  • 3Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215
  • 4Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/
  • 5Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994
  • 6Mitchell T. Machine Learning. New York: McGraw Hill, 1997
  • 7Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000
  • 8Birbeck M et al. Professional XML. Wrox Press Inc, 2000
  • 9Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621
  • 10Chamberlin D, Robie J, Florescu D. Quilt: An XML query language for heterogeneous data sources. In: Proc International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, 2000. 53-62

共引文献145

同被引文献11

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部