摘要
1.引言在信息化程度日益提高的今天,半结构化信息已遍及社会的各个领域。例如,网际网(World WideWeb,又称WWW)已成为一个巨大的信息源,然而WWW上的信息并不能以一种通用的方式进行查询及操纵,大量的信息是以静态的HTML文本形式存储并只能通过浏览器来浏览,因此如何有效利用这类信息显得尤为重要。
It is well known that World Wide Web has become a huge information resource. However, the information on WWW can not be queried and mainpulated in a general way. Large amount of information is stored in a static HTML format and can only be viewed through browser. Therefore, it is very important for us to utilize this kind of information effectively. This paper proposes a semi-structured data extraction method to get the useful information embedded in a group of relevant web pages, and store it with OEM (Object Exchange Model). Then, we adopt data mining method to discover schema knowledge implicit in the semi-structured data.
出处
《计算机科学》
CSCD
北大核心
1999年第10期49-52,共4页
Computer Science
基金
国家自然科学基金