摘要
属性值是描述Ontology中类的重要信息,但是当前关于属性值的自动提取的研究并不多。该文提出一种基于WWW的Ontology属性值自动提取方法。论文首先提出了一种在小规模属性值种子集的基础上,包含属性值的句子的选择与属性值提取互动的方法。这种方法利用互联网信息的冗余性,自动抽取并扩充目标属性值集合。然后,为避免人工构造属性值种子集,提出种子集自动生成的方法。我们设计实验来计算提取结果的正确率和召回率,此外,我们还通过将填充后的Ontology信息用于网页正文提取任务来展示Ontology自动扩充结果的有效性。
Attributes value is among the most important information to describe Ontology. However, few researches have been done about attribute values extraction so far. This paper proposes a method of extracting Ontology attribute values automatically based on WWW. Firstly, an interactive method is described to unilize the interaction between the attribute-val ue-related sentence selection and the attribute values extraction. This method can expand the target attribute value set from a seed set by the redundancy of WWW. Secondly, we present a method to construct the seed automatically. Experiments are conducted to examine the method in terms of precision and recall. In addition, automatically enriched Ontology informa tion is applied in webpage content extraction to test the usefulness of our approach.
出处
《中文信息学报》
CSCD
北大核心
2008年第6期69-74,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60503071)
国家973资助项目(2004CB318102)
关键词
计算机应用
中文信息处理
因特网
互动方法
属性值提取
computer application
Chinese information processing
WWW
interactive method
attribute value extraction