摘要
现代信息系统规模日益扩大,通过分析结构各异的多源日志可以快速了解系统行为。日志参数的语义表征了系统中的实体信息,对实现多源日志的联合分析至关重要。但现有解析方法对日志参数的语义特征捕捉不足,存在语义缺失、语义覆盖范围不广、语义识别准确率不足等问题。因此,文章提出一种基于参数语义的日志解析方法(PS-Parser),该方法通过构建BERT模型捕捉日志上下文语义特征,提取日志参数的语义,并通过常规参数语义特征库,补全日志参数不同层次的语义,最终根据参数语义表征系统实体,实现多源日志联合分析。文章在6个多源真实数据集上进行实验,日志参数解析的平均准确率为94.7%,平均语义覆盖率为81.7%,语义解析的平均F1分数为0.991,相较于现有方法有显著提升,验证了所提方法的有效性。最后,针对大数据系统下的日志分析场景,验证了基于参数语义的日志解析方法对多源日志联合分析工作的支持作用。
Modern information systems are increasingly large,and their behavior is reflected in diverse multi-source logs.The semantics of log parameters represent entity information within the system,which is crucial for the joint analysis of multi-source logs.However,existing parsing methods inadequately capture the semantic features of log parameters,leading to issues such as semantic gaps,limited coverage,and insufficient accuracy in semantic recognition.To address this,this paper proposed a parameter semantics-based log parsing method,(PS-Parser),which captured the semantic features of log context using a BERT model,extracted the semantics of log parameters,and complemented the semantics at different levels through a conventional parameter semantic feature library.Ultimately,it represented system entities based on parameter semantics to achieve joint analysis of multi-source logs.Experiments on six multi-source real datasets show an average accuracy of 94.7% for log parameter parsing,an average semantic coverage of 81.7%,and an average F1 score of 0.991 for semantic parsing,significantly improving upon existing methods and validating the effectiveness of the proposed approach.Finally,the support of the parameter semantics-based log parsing method for joint analysis of multi-source logs in big data system scenarios is verified.
作者
邢瀚韬
阮树骅
陈良国
曾雪梅
XING Hantao;RUAN Shuhua;CHEN Liangguo;ZENG Xuemei(School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China;Key Laboratory of Data Protection and Intelligent Management,Chengdu 610065,China;Cyber Science Research Institute,Sichuan University,Chengdu 610065,China)
出处
《信息网络安全》
北大核心
2025年第4期610-618,共9页
Netinfo Security
基金
中央高校基本科研业务费专项资金[SCU2024D012]
四川大学理工学科内涵发展项目[2020SCUNG129]。
关键词
日志解析
参数语义提取
多源日志分析
log parsing
semantic of parameters extraction
multi-source log analysis