Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than consider...Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations.展开更多
In weakly supervised video anomaly detection(WSVAD)tasks,the temporal relationships of video are crucial for modeling event patterns.Transformer is a commonly used method for modeling temporal relationships.However,du...In weakly supervised video anomaly detection(WSVAD)tasks,the temporal relationships of video are crucial for modeling event patterns.Transformer is a commonly used method for modeling temporal relationships.However,due to the large amount of redundancy in videos and the quadratic complexity of the Transformer,this method cannot effectively model long-range information.In addition,most WSVAD methods select key snippets based on predicted scores to represent event patterns,but this paradigm is susceptible to noise interference.To address the above issues,a novel temporal context and representative feature learning(TCRFL)method for WSVAD is proposed.Specifically,a temporal context learning(TCL)module is proposed to utilize both Mamba with linear complexity and Transformer to capture short-range and long-range dependencies of events.In addition,a representative feature learning(RFL)module is proposed to mine representative snippets to capture important information about events,further spreading it to video features to enhance the influence of representative features.The RFL module not only suppresses noise interference but also guides the model to select key snippets more accurately.The experimental results on UCF-Crime,XD-Violence,and ShanghaiTech datasets demonstrate the effectiveness and superiority of our method.展开更多
In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although mu...In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.展开更多
文摘Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations.
基金supported in part by the National Natural Science Foundation of China(62171347,62101405,62371373,6227137).
文摘In weakly supervised video anomaly detection(WSVAD)tasks,the temporal relationships of video are crucial for modeling event patterns.Transformer is a commonly used method for modeling temporal relationships.However,due to the large amount of redundancy in videos and the quadratic complexity of the Transformer,this method cannot effectively model long-range information.In addition,most WSVAD methods select key snippets based on predicted scores to represent event patterns,but this paradigm is susceptible to noise interference.To address the above issues,a novel temporal context and representative feature learning(TCRFL)method for WSVAD is proposed.Specifically,a temporal context learning(TCL)module is proposed to utilize both Mamba with linear complexity and Transformer to capture short-range and long-range dependencies of events.In addition,a representative feature learning(RFL)module is proposed to mine representative snippets to capture important information about events,further spreading it to video features to enhance the influence of representative features.The RFL module not only suppresses noise interference but also guides the model to select key snippets more accurately.The experimental results on UCF-Crime,XD-Violence,and ShanghaiTech datasets demonstrate the effectiveness and superiority of our method.
文摘In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.