摘要
为了有效解决数控机床领域,由于知识没有结构化描述,导致知识获取效率低的问题以及实现知识重用和知识共享,通过对该领域Web文本中机床知识进行研究,分析文本结构特点,提出一种基于本体的数控机床知识抽取方法。对爬虫程序获取的文档进行预处理,通过模式匹配的方式抽取Web文本中存在上下位关系的语句,经过中文分词系统ICTCLAS分词处理后抽取概念,构建概念集合和概念树,最终构建领域本体并以OWL语言储存。实验中对随机选取的网页进行知识抽取,并采用对比实验,证明该方法能有效地对数控机床领域中半结构化和结构化Web文本信息进行获取。
Aiming at Knowledge reuse and knowledge sharing and solving the problem of inefficiency of knowledge acquisition caused by the knowledge are not structured description, a method of knowledge acquisition based on ontology for CNC machine tools is proposed by studying the Web text in the field of machine tool knowledge and analyzing of text structure features. The original information of the Web text is obtained through the crawler program, and the sentences containing the hyponymy is extracted from the processed Web text information. The Institute of Computing Technology, Chinese Lexical Analysis System (ICTCLAS) is used to carry on the Chinese word segmentation. The concepts can be acquired after segmenting Chinese texts and then the concept set and the concept tree can be generated. At last, the ontology is generated and stored as OWL. In the experiment, the knowledge is extracted from the Web pages which is selected randomly, and the comparison experiment is carried out to prove that the method can obtain the semi-structured and structured Web text information in the field of CNC machine tools effectively.
出处
《电子测量与仪器学报》
CSCD
北大核心
2017年第4期651-656,共6页
Journal of Electronic Measurement and Instrumentation
基金
国家自然科学基金(51575055)
北京市教委科研计划(KM201611232020)
北京市重点实验室开放课题(KF20161123203)
北京市重点实验室开放课题(KF20161123201)项目资助
关键词
数控机床
知识获取
本体
中文分词
CNC machine tools
knowledge acquisition
ontology
Chinese word segmentation