摘要
Web数据挖掘的研究越来越广泛,对Web数据的抽取则是研究的前提和必要的步骤。而当前Web信息大多数都是HTML格式的,存在很多缺陷。针对目前研究,简单地介绍了XML及它的特点,并且对HTMLandXML从开放性、可操作性等几个方面做了比较,显示出了XML的优越性。最后利用一个实例简述了基于XML的数据抽取过程。
The research on Web mining become more and more abroad in the world, and Web data extraction is the precise and necessary procedure of the research. However, large volume of current Web information which existed many defects is constructed in HTML format. According to this current study of the forms. XML and it's characteristics briefy are introduced. Then compared the HTML and XML from the following ty etc. In order to show the superiority of XML, a true example is used to brief the ly. aspects, such as : opening, operabiliprocess of the Web extraction, final-
出处
《科学技术与工程》
2008年第9期2473-2476,共4页
Science Technology and Engineering