期刊文献+

基于混合嵌入的专利数据层次多标签分类模型研究

Research on Hierarchical Multi-Label Classification Model of Patent Data Based on Hybrid Embedding
在线阅读 下载PDF
导出
摘要 现有的深度学习分类模型在专利数据层次多标签分类上因标签间关联紧密、文本内容长且上下文语义信息丰富等因素导致分类效果不佳。本文提出了一种基于混合嵌入技术的专利数据层次多标签分类模型,旨在利用混合嵌入在表示标签关联性方面的优势,提升专利自动分类的效率。首先,通过将文本词嵌入与位置编码信息相结合,捕捉序列数据中的上下文信息;其次,构建混合嵌入,包括层次结构中类别的图嵌入和类别标签的词嵌入;采用图神经网络的自编码器模型对类别的层级结构进行图编码,确保类别在结构上具有区分性;通过词嵌入技术对标签信息进行编码,保证类别在语义层面可区分;最后,提出了一种基于双向门控循环单元(Bi-GRU)网络模型的混合嵌入方法,用于逐层学习文本表示。在德温特专利情报数据集进行的实验结果显示,提出的模型在评价指标方面表现出色,相比于其他模型,整体准确率提高了至少1.1%。 Existing deep learning classification models often perform poorly on hierarchical multi-label classification of patent data due to the close relationships between labels and the long text content that contains rich contextual semantic information.This paper proposes a hierarchical multi-label classification model for patent data based on mixed embedding techniques,aiming to leverage the advantages of mixed embeddings in representing label correlations to enhance the efficiency of automatic patent classification.First,contextual information in sequential data is captured by combining text word embeddings with positional encoding information.Secondly,a mixed embedding approach is constructed,incorporating graph embeddings of categories within the hierarchy and word embeddings of category labels.A graph neural network autoencoder model is employed to encode the hierarchical structure of categories,ensuring structural distinguishability among them.Additionally,word embedding techniques are utilized to encode label information,guaranteeing semantic distinguishability of the categories.Finally,a mixed embedding method based on a bidirectional gated recurrent unit(Bi-GRU)network model is developed for layer-wise learning of text representations.The experimental data set was selected from the Derwent Innovations Index,and the experimental results show that the proposed model performs well on the evaluation index,and the overall accuracy is improved by at least 1.1%compared with other models.
作者 金晶 陶皖 皇苏斌 李军军 JIN Jing;TAO Wan;HUANG Subin;LI Junjun(School of Computer and Information,Anhui Polytechnic University,Wuhu 241000)
出处 《长春理工大学学报(自然科学版)》 2025年第2期91-101,共11页 Journal of Changchun University of Science and Technology(Natural Science Edition)
基金 安徽省高校自然科学基金重点项目(2022AH050972) 安徽省科技厅自科项目(2308085MF220) 安徽省教育厅社科重大项目(2024AH040282) 安徽工程大学大学生科研项目(2023DZ32)。
关键词 专利分类 多标签层次文本分类 混合嵌入 位置编码 图神经网络 patent classification multi-label hierarchical text classification hybrid embedding positional encoding graph neural network
  • 相关文献

参考文献4

二级参考文献37

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部