The rapid proliferation of Internet of Things(IoT)devices necessitates the deployment of lightweight Network Intrusion Detection Systems(NIDS)at the network edge.Knowledge Distillation(KD)has emerged as a prevailing p...The rapid proliferation of Internet of Things(IoT)devices necessitates the deployment of lightweight Network Intrusion Detection Systems(NIDS)at the network edge.Knowledge Distillation(KD)has emerged as a prevailing paradigm to compress cumbersome deep learning models for resource-constrained environments.However,existing KD approaches-whether employing full-representation transfer or static gradient masking-indiscriminately distill both generalized attack signatures and dataset-specific environment noise.This rigid feature entanglement leads to severe negative transfer,exacerbates the catastrophic forgetting of long-tail attack categories,and dramatically degrades cross-environment generalization.To overcome these critical limitations,we propose Disentangled Representation Distillation(DRD),a novel framework that fundamentally alters the nature of transferred knowledge.DRD compels the high-capacity teacher model to decouple high-dimensional network traffic representations into two orthogonal latent sub-spaces:pure Attack Semantics and Environment Background.During the distillation phase,dataset-specific background noise is explicitly discarded,allowing the student model to exclusively inherit purified attack logic.This mechanism circumvents the computational overhead of dynamic masking while equipping the student with inherent robustness against domain shifts.Comprehensive experiments across benchmark datasets(e.g.,UNSW-NB15 and CICIDS2017)demonstrate that DRD significantly improves the detection of rare,long-tail attacks,achieves unprecedented cross-dataset generalization,and maintains an ultra-lightweight footprint suitable for edge deployment.展开更多
文摘The rapid proliferation of Internet of Things(IoT)devices necessitates the deployment of lightweight Network Intrusion Detection Systems(NIDS)at the network edge.Knowledge Distillation(KD)has emerged as a prevailing paradigm to compress cumbersome deep learning models for resource-constrained environments.However,existing KD approaches-whether employing full-representation transfer or static gradient masking-indiscriminately distill both generalized attack signatures and dataset-specific environment noise.This rigid feature entanglement leads to severe negative transfer,exacerbates the catastrophic forgetting of long-tail attack categories,and dramatically degrades cross-environment generalization.To overcome these critical limitations,we propose Disentangled Representation Distillation(DRD),a novel framework that fundamentally alters the nature of transferred knowledge.DRD compels the high-capacity teacher model to decouple high-dimensional network traffic representations into two orthogonal latent sub-spaces:pure Attack Semantics and Environment Background.During the distillation phase,dataset-specific background noise is explicitly discarded,allowing the student model to exclusively inherit purified attack logic.This mechanism circumvents the computational overhead of dynamic masking while equipping the student with inherent robustness against domain shifts.Comprehensive experiments across benchmark datasets(e.g.,UNSW-NB15 and CICIDS2017)demonstrate that DRD significantly improves the detection of rare,long-tail attacks,achieves unprecedented cross-dataset generalization,and maintains an ultra-lightweight footprint suitable for edge deployment.