摘要
万维网上非结构化、异构、海量的信息资源给人们寻求合适的信息造成了困难.搜索引擎、个性化搜索工具及其他相关技术的使用,在一定程度上缓解了这个问题,但在面对人们长期的、系统的知识获取需求时,它们所起的作用相对有限.本文分析了人们通过万维网求知的一般过程,并结合一个系统的设计和实现讨论了能有效协助用户万维网求知的具体相关技术,并阐述了该多用户协作的万维网知识获取系统的整体结构和特点.
A mass of unstructured, heterogeneous information resources on Word Wide Web challenge people a lot when they seek required information. Search engines, personal search tools and other related technologies alleviate the trouble to some extent. But confronted with the needs of longterm, systematic knowledge acquisition, they reveal limited functionality. Here the general process of knowledge acquisition via WWW is analyzed and the related technologies that help this process are discussed. First of all, the information extraction is the way to get the necessary materials from the web. Then the ontology-related technology is used to annotate these materials, making them contain more semantic information. Method derived from the document classification technology is applied to automate the process of annotation since manual annotation is time - consuming . Based on the enriched materials more sophisticated user query interface is designed, by which structure information retrieval and more precise query result can be provided. Since some kind of information (such as scientific papers and technical reports, etc. ) retrieved from the web is lengthy and diffcult to understand, a user collaborated model is built via which user's time can be saved. Lestly, a framework is made to facilitate WWW knowledge acquisition.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2002年第3期285-289,共5页
Pattern Recognition and Artificial Intelligence
关键词
万维网
信息抽取
语义
知识获取
多用户协作
World Wide Web (WWW), Information Extraction, Semantic, Knowledge Acquisition, Multi-User Collaboration