期刊文献+

mBERT与多源领域自适应协同的工控协议逆向方法

Collaborative mBERT and multi-source domain adaptation for industrial control protocol reverse engineering
在线阅读 下载PDF
导出
摘要 【目的】工业控制系统(industrial control system,ICS)中设备间通信过程高度依赖工控协议来实现,协议安全性对保障ICS稳定运行起到关键作用。漏洞挖掘与入侵检测等作为ICS安全防御体系的核心技术组件,其有效性依赖于对工控协议结构及语义功能的精确解析。协议逆向分析作为解析协议结构与语义功能的关键技术,其核心环节语义推断精度直接决定协议理解的准确性。然而,受限于工控协议文档缺失、格式异构性强等现实条件,现有语义推断方法普遍依赖专家经验,存在自动化水平不足、跨协议泛化性能有限等固有瓶颈,难以适应实际工业环境中多源异构协议的高精度解析需求。【方法】为解决上述问题,本文提出mBERT协同多源领域自适应与结构化掩码策略的语义推断方法。通过mBERT模型实现跨协议通用语义表示;利用结合注意力权重与位置编码设计的结构化掩码策略,增强模型对协议结构和语义内在联系的表示能力,提高语义推断方法的自动化程度和效率;利用结合对抗训练的多源领域自适应逐步微调策略,提升模型对多个源协议的语义通用表示能力,增强其在多种工控协议上的适用性,实现关键字语义的有效推断。【结果】在辽宁省石油化工行业信息安全重点实验室的典型能源企业攻防演练靶场中开展实验验证,采集了S7comm、Modbus/TCP和EtherNet/IP三种工控协议数据,并利用协议复杂度评分机制组建训练数据集。结果表明,多源领域自适应逐步微调策略能够显著提升模型性能,将其与结构化掩码策略结合,进一步提高了语义推断精度,且本文方法在精确度、召回率与F_(1)分数指标上均显著优于现有基线方法。【结论】本文提出了mBERT协同多源领域自适应与结构化掩码策略的语义推断方法,在语义推断中采用高维球面映射与多任务损失函数,增强了不同语义类别的区分度与模型对协议语义的深层辨识能力。本文方法不仅显著降低了对人工先验知识的依赖,也提升了语义推断效率与跨协议适用性,为工控协议逆向分析及工业系统安全防护提供了具备理论支撑的新路径。 [Objective]In industrial control systems(ICS),communication between devices rely heavily on industrial control protocols,and the security of these protocols is essential for stable ICS operation.Vulnerability detection and intrusion detection,as core components of the ICS defense framework,require accurate analysis of protocol structures and semantic functions.Protocol reverse engineering serves as a key technique for this purpose,and the precision of semantic inference directly determines the accuracy of protocol understanding.However,due to the absence of protocol documentation and strong format heterogeneity,existing semantic inference methods generally rely on expert knowledge,resulting in insufficient automation and limited cross-protocol generalization.Consequently,they fail to meet the high precision analysis needs of multi-source heterogeneous protocols in real industrial environments.[Methods]To solve the above problem,this study proposed a semantic inference method that integrated mBERT,multi-source domain adaptation,and a structured masking strategy.Cross-protocol semantic representations were achieved through the mBERT model.A structured masking strategy that combined attention weights and positional encoding was designed to enhance the model′s ability to capture intrinsic correlations between protocol structure and semantics,which improved the automation and efficiency of semantic inference.A progressive multi-source domain adaptation strategy with adversarial training further strengthened the model′s generalized semantic representation across multiple source protocols,enhanced its applicability to various industrial control protocols,and enabled effective inference of keyword semantics.[Results]Experiments were conducted in the target range for offensive and defensive drills in typical energy enterprises in the Key Laboratory of Information Security for the Petrochemical Industry in Liaoning Province.Data from three industrial control protocols,S7comm,Modbus/TCP,and EtherNet/IP,were collected,and a training dataset was built using a protocol-complexity scoring mechanism.The results show that the progressive multi-source domain adaptation strategy significantly improves model performance.When it is combined with the structured masking strategy,semantic inference accuracy is further enhanced.The proposed method achieves significantly higher precision,recall,and F_(1)-score compared with existing baseline methods.[Conclusions]This study proposes a semantic inference method that integrates mBERT,multi-source domain adaptation,and structured masking.High-dimensional spherical mapping and multi-task loss functions used in semantic inference improve the distinguishability of different semantic categories and enhance the model′s deeper recognition capability for protocol semantics.The proposed method significantly reduces reliance on manual prior knowledge,increases inference efficiency,and improves cross-protocol applicability.It provides a theoretically grounded new pathway for industrial control protocol reverse engineering and ICS security protection.
作者 宗学军 易容光 刘昱萱 何戡 史洪岩 孙逸菲 宁博伟 ZONG Xuejun;YI Rongguang;LIU Yuxuan;HE Kan;SHI Hongyan;SUN Yifei;NING Bowei(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,Liaoning,China;Key Laboratory of Information Security for Petrochemical Industry in Liaoning Province,Shenyang University of Chemical Technology,Shenyang 110142,Liaoning,China;School of Artificial Intelligence,Shenyang University of Technology,Shenyang 110870,Liaoning,China;School of Electrical Engineering,Shenyang Institute of Science and Technology,Shenyang 110167,Liaoning,China)
出处 《沈阳工业大学学报》 2026年第1期63-73,共11页 Journal of Shenyang University of Technology
基金 辽宁省科技重大专项项目(2025JH1/11700021,2024JH1/11700049) 辽宁省应用基础研究计划项目(2025JH2/101300012) 辽宁省自然科学基金项目(2023-MSLH-273)。
关键词 工控协议 结构化掩码 语义推断 注意力权重 多源领域自适应 mBERT模型 词向量 对抗训练 industrial control protocol structured mask semantic inference attention weight multi-source domain adaptation mBERT model word vector adversarial training

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部