Log anomaly detection is essential for maintaining the reliability and security of large-scale networked systems.Most traditional techniques rely on log parsing in the reprocessing stage and utilize handcrafted featur...Log anomaly detection is essential for maintaining the reliability and security of large-scale networked systems.Most traditional techniques rely on log parsing in the reprocessing stage and utilize handcrafted features that limit their adaptability across various systems.In this study,we propose a hybrid model,BertGCN,that integrates BERT-based contextual embedding with Graph Convolutional Networks(GCNs)to identify anomalies in raw system logs,thereby eliminating the need for log parsing.TheBERT module captures semantic representations of log messages,while the GCN models the structural relationships among log entries through a text-based graph.This combination enables BertGCN to capture both the contextual and semantic characteristics of log data.BertGCN showed excellent performance on the HDFS and BGL datasets,demonstrating its effectiveness and resilience in detecting anomalies.Compared to multiple baselines,our proposed BertGCN showed improved precision,recall,and F1 scores.展开更多
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,under grant no.(GPIP:1074-612-2024).
文摘Log anomaly detection is essential for maintaining the reliability and security of large-scale networked systems.Most traditional techniques rely on log parsing in the reprocessing stage and utilize handcrafted features that limit their adaptability across various systems.In this study,we propose a hybrid model,BertGCN,that integrates BERT-based contextual embedding with Graph Convolutional Networks(GCNs)to identify anomalies in raw system logs,thereby eliminating the need for log parsing.TheBERT module captures semantic representations of log messages,while the GCN models the structural relationships among log entries through a text-based graph.This combination enables BertGCN to capture both the contextual and semantic characteristics of log data.BertGCN showed excellent performance on the HDFS and BGL datasets,demonstrating its effectiveness and resilience in detecting anomalies.Compared to multiple baselines,our proposed BertGCN showed improved precision,recall,and F1 scores.