摘要
工艺文本数据的标准化对制造业数据集成与重用有着重要的意义,为了解决制造类企业内工艺文本数据描述不规范、不统一的问题,提出一种非监督数据匹配和监督学习数据匹配相结合的方法,通过融合BM25算法和DSSM算法实现工艺文本数据的低成本标准化。首先,由企业工艺数据管理系统获取并预处理工艺文本数据,同时根据企业实际情况构建企业数据字典。其次,使用无监督的BM25算法,在文本相似度层面对小批量工艺文本数据和企业数据字典进行粗匹配,由专家校验粗匹配结果以生成训练数据集。最后,利用训练数据集支撑基于监督学习的DSSM算法的训练,实现工艺文本数据在语义相似度层面的精细匹配。在家电生产企业的工序名称标准化任务中进行了验证,证明了所提方法的有效性。该方法能够有效减少制造企业工艺文本数据标准化过程中的人工成本,并在最大程度上保证工艺数据标准化过程的准确性。
The standardization of process text data is crucial for data integration and reuse in manufacturing.To address the issue of inconsistent and non-uniform descriptions of process text data within manufacturing enterprises,a combined method of unsupervised data matching and supervised learning data matching was proposed,which integrating the BM25 algorithm and the DSSM algorithm to achieve low-cost standardization of process text data.First,the process text data was obtained and preprocessed from the enterprise′s process data management system,and an enterprise data dictionary was constructed based on the actual situation of the enterprise.Next,the unsupervised BM25 algorithm was used to coarsely match small batches of process text data with the enterprise data dictionary at the text similarity level.Experts then verified the coarse matching results to generate a training dataset.Finally,the training dataset was used to support the training of the DSSM algorithm based on supervised learning to achieve fine matching of process text data at the semantic similarity level.Validation was conducted on the standardization task of process names in a home appliance manufacturing company,demonstrating the effectiveness of the proposed method.This method can significantly reduce the labor costs involved in the standardization of process text data in manufacturing enterprises while ensuring the accuracy of the standardization process to the greatest extent possible.
作者
张金龙
高琦
吴春阳
翟健丰
李文琪
ZHANG Jinlong;GAO Qi;WU Chunyang;ZHAI Jianfeng;LI Wenqi(School of Mechanical Engineering,Shandong University,Jinan 250061,China;Key Laboratory of High Efficiency and Clean Mechanical Manufacture(Shandong University),Ministry of Education,Jinan 250061,China;Rizhao Institute of Shandong University,Rizhao 276827,China)
出处
《现代制造工程》
北大核心
2025年第8期93-99,共7页
Modern Manufacturing Engineering
基金
国家重点研发计划项目(2018YFB1702601)。
关键词
计算机集成制造
制造业
工艺文本数据
标准化
文本匹配
深度学习
computer integrated manufacturing
manufacturing
process text data
standardization
text matching
deep learning