摘要
计算机对中文表达的非结构化电力调度文本的自然语言理解和语义解析是进一步提升调度智能化程度的难点之一。只有当机器能正确理解人类调度语句的含义时,才能真正实现调度中的人机协作。该文以停电计划为研究对象,提出一种能自主学习语义结构知识的电力文本语义解析技术。首先,利用Skip-gram模型实现词典文本向量化,应用层次聚类算法归纳调度语句模式,并将句法模式改写为正则表达式,建立句法知识库;采用短句结构文法技术辨识词语组合的从属关系。定义专属语义框架抽取重要设备信息;应用依存句法分析解决主语脱落现象。能更好地解决其他方法无法精准理解句意的问题。
Understanding the unstructured Chinese texts is not an easy job for computers,which becomes an obstacle for further application of artificial intelligence in the power dispatching area.The premise for the collaboration of machine and human being in power system operation depends greatly on the computer’s understanding of the orders from human dispatchers in natural language.This paper proposes a textual semantic analysis framework with active learning of the semantic structures.Firstly,the words in dictionaries are vectorized by the Skip-gram models.A hierarchical clustering algorithm is designed to detect the sentence patterns used in the power dispatching area.Then the grammar base is set up by converting the sentence structures to their regular expressions.The Phrase Structure Grammar is further used to identify the subordination of word combinations.A proprietary semantic framework is defined finally to extract the important device information,while the dependency syntax analysis is applied to resolve the problem of subject dropout.This proposed framework can understand the meaning of the sentence more accurately than other methods.
作者
佟佳弘
武志刚
管霖
刘奇
杜亮
徐良德
TONG Jiahong;WU Zhigang;GUAN Lin;LIU Qi;DU Liang;XU Liangde(School of Electrical Engineering,South China University of Technology,Guangzhou 510640,Guangdong Province,China;System Operation Department,Guangzhou Power Supply Co.,Ltd.,Guangzhou 510620,Guangdong Province,China)
出处
《电网技术》
EI
CSCD
北大核心
2020年第11期4148-4155,共8页
Power System Technology
基金
广东电网有限责任公司广州供电局科技项目(南方电网公司重点科技项目)(GZHKJXM20170059)。
关键词
语义解析
层次聚类
语义框架
短句结构文法
text parsing
hierarchical clustering
semantic framework
phrase structure grammar