结合视觉Mamba和块特征分布的工业异常检测

Industrial anomaly detection by combining visual Mamba and patch feature distribution

导出

摘要目的工业异常检测在现代工业生产中具有至关重要的作用,现有的工业异常检测方法主要是基于卷积神经网络(convolutional neural network,CNN)或视觉变换器(vision Transformer,ViT)网络来实现。然而,CNN存在难以处理长距离依赖关系的不足,而ViT又面临时间复杂度高的问题。基于此,提出一种结合视觉Mamba和块特征分布的无监督工业异常检测模型。方法该模型包含两个互补分支网络:块特征分布估计网络和基于视觉Mamba的自编码重建网络。块特征分布估计网络主要依赖局部块特征进行异常检测,通过融合高效的预训练块特征描述网络以及视觉Mamba编码器提取的正常样本的块特征,学习一个高斯混合密度网络来估计正常样本局部块特征的分布。在测试阶段利用高斯混合密度网络估计异常图像的各个位置的异常得分,从而得到一个局部异常得分图(local anomaly map,LAM);基于视觉Mamba的自编码重构网络则利用视觉Mamba编码器来捕捉长距离关联特征,增强对跨不同类别和形态的复杂异常图像的全局建模能力,在测试阶段利用重建误差估计异常图像的全局异常得分图(global anomaly map,GAM);最后,合并LAM和GAM得到最终检测结果。结果在MvTec AD(MvTec anomay detection dataset)、VisA和BTAD(bean tech anomaly detection)等公开数据集上与其他先进算法进行了比较,取得了有竞争力的结果。在MvTec AD数据集上所提模型相比性能第2的模型在像素级上AU-ROC(area under the receiver operating characteristic curve)指标提升了0.9%,在图像级上AU-ROC指标提升了2.4%。在BTAD数据集所提模型相比性能第2的模型在图像级上AU-ROC提升0.4%。在VisA数据集上模型相比性能第2的模型在像素级上AU-ROC指标提升了0.6%。结论将视觉状态空间用于图像重建检测图像异常是可行的,检测效果具有竞争力。 Objective Industrial image anomaly detection plays a crucial role in modern industrial production because it can timely detect defects in products,effectively improve product qualification rate,enhance industrial productivity,and reduce production costs.Traditional anomaly detection algorithms often show certain limitations when dealing with new types of anomalies,especially complex issues such as logical anomalies.Thus,they have difficulty meeting the demand for high-precision and efficient detection in industrial production.Therefore,this study is committed to exploring the potential application of visual state space in the field of image processing and anomaly detection.The aim is to find a more effective method for addressing the shortcomings of traditional algorithms in detecting new types of anomalies,especially the limitations in handling logical anomalies.The reconstruction-based method is considered capable of addressing logical anomalies caused by factors such as object quantity,structure,position,and arrangement order because using only normal images to train the model will result in significant errors in the reconstructed output compared with images containing logical anomalies.Existing reconstruction-based anomaly detection methods are mainly based on convolutional neural networks(CNNs)or vision Transformer(ViT)networks.However,CNN exhibits difficulty in handling long-distance dependencies,while ViT presents high time complexity.The latest research shows that state space models represented by Mamba can effectively model long dependencies while maintaining linear complexity.We have explored the potential application of visual state space in anomaly detection and aspire to develop a more precise and efficient image anomaly detection technology by leveraging its advantages to meet strict quality control requirements in industrial production.This endeavor will drive industrial production toward intelligent automation direction while improving overall efficiency and competitiveness.Method A novel unsupervised industrial anomaly detection model combining visual Mamba and patch feature distribution is proposed.This model consists of two complementary branch networks:a patch feature distribution estimation network and a self-encoding reconstruction network based on visual Mamba.The patch feature distribution estimation network primarily relies on local patch features for anomaly detection.It fuses local patch features of normal samples through the Vision Mamba encoder and pretrained efficient patch description network and learns a Gaussian mixture density network to estimate the distribution of these features.During the testing phase,this Gaussian mixture density network is used to estimate anomaly scores at various positions in the anomalous images,which produces a local anomaly map(LAM).Meanwhile,the self-encoding reconstruction network based on visual Mamba utilizes a visual Mamba encoder to capture long-range associated features,which enhances the global modeling capability for complex anomaly detection across different categories and forms.In the testing phase,reconstruction errors are used to estimate a global anomaly map(GAM)for the anomalous images.Finally,LAM and GAM are combined to obtain the final detection results.For the dataset,we conducted detailed preprocessing and clipped the images to appropriate sizes according to the requirements of different models.For example,the size of the input image was 256×256 pixels.We carefully adjusted the number of coding blocks in the encoder of the visual state space in the reconstruction method to achieve the best anomaly detection performance and maximize the overall performance of the model.The experiments in this study were conducted on a desktop computer equipped with an Intel Core i5,2.5 GHz CPU,GeForce GTX 3060Ti GPU with 12 GB memory,32 GB RAM,and Ubuntu18.04 as the operating system.According to our experimental observations,we set the learning rate to 0.001,configured the model to run for 200 epochs,and determined a batch size of 48.Regarding the selection of image blocks,in the PDN method combined with Patchsize,we chose a value of 32.Result We compared our model with other advanced algorithms on publicly available datasets such as MvTec AD,VisA,and BTAD,and our model demonstrated highly competitive performance.On the MvTec AD dataset,our model improved the pixel-level AU-ROC metric by 0.9%to reach 93.9%,and the image-level AU-ROC metric by 2.4%to reach 93.8%,compared with the second-best performing model.On the BTAD dataset,our model achieved a 0.4%improvement in image-level AU-ROC(reaching 92.6%)compared with the second-best performing model.On the VisA dataset,our model achieved a 0.6%improvement in pixel-level AU-ROC(reaching 96.6%)compared with the second-best performing model.According to visualizations of anomaly localization in our study on MvTec and VisA datasets,the anomaly localization of our model is more accurate than those of other models.Conclusion The application of visual state space to image reconstruction for detecting image anomalies is a feasible and effective method,and its anomaly localization effect has significant competitiveness.This study believes that aggregating features in the middle of the extraction model will be more helpful for adapting to anomaly detection tasks.The setting of the number of image block vectors may be helpful for the localization and detection of anomalies because more image block descriptor vectors can represent more detailed information.The two points are worth further research in the future.This study organically combines two popular methods in the industrial anomaly detection field while integrating visual state space into the model,which supports its application in the field of anomaly detection.

作者刘建明庄维宽 Liu Jianming;Zhuang Weikuan(School of Digital Industry,Jiangxi Normal University,Shangrao 334000,China;School of Computer and Information Engineering,Jiangxi Normal University,Nanchang 330000,China)

机构地区江西师范大学数字产业学院江西师范大学计算机信息工程学院

出处《中国图象图形学报》北大核心 2025年第10期3215-3229,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(62266022) 江西省自然科学基金项目(20242BAB25110)。

关键词异常检测异常分割视觉状态空间模型(SSM) 高斯密度混合网络异常数据集 anomaly detection anomaly segmentation vision state space model(SSM) Gaussian density approximation network anomalydataset

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1王素琴,程成,石敏,朱登明.结合频率和ViT的工业产品表面相似特征缺陷检测方法[J].中国图象图形学报,2024,29(10):3074-3089. 被引量：3

二级参考文献3

1Runwei Ding,Linhui Dai,Guangpeng Li,Hong Liu.TDD-net: a tiny defect detection network for printed circuit boards[J].CAAI Transactions on Intelligence Technology,2019,4(2):110-116. 被引量：117
2黄凤荣,李杨,郭兰申,钱法,朱雨晨.基于Faster R-CNN的零件表面缺陷检测算法[J].计算机辅助设计与图形学学报,2020,32(6):883-893. 被引量：55
3戚银城,武学良,赵振兵,史博强,聂礼强.嵌入双注意力机制的Faster R-CNN航拍输电线路螺栓缺陷检测[J].中国图象图形学报,2021,26(11):2594-2604. 被引量：43

共引文献2

1封筠,孟旭静,尚玉全,牛超凡.结合适配器增强的双阶段连续缺陷判别[J].中国图象图形学报,2025,30(8):2675-2689.
2杨冬梅,张超,张健楠,彭远翔,陈新月,张若涵.数据驱动的跨模态协同智能识别产品设计风格与评估方法[J].机械设计,2025,42(12):182-188.

1康奔,陈鑫,赵洁,王栋.TMamba:面向高效目标跟踪的视觉状态空间模型[J].中国图象图形学报,2025,30(10):3199-3214.
2王兴刚,张长青,任文琦,傅雪阳,周涛,赵峰,石争浩,陈秀妍.《中国图象图形学报》视觉状态空间模型及应用专栏简介[J].中国图象图形学报,2025,30(10):3171-3172.
3李愿,付辉,刘浩志.双重注意力下的多尺度残差遥感图像去雾网络[J].自然资源遥感,2025,37(4):31-39. 被引量：1
4Li Yang,Adegboyega Adeniji,Ziteng Zhou,Gantsetseg Ganbaatar,Xiaohong Lu,Shidong Li,Boming Wu,Guangnan Zhang,Qiwen Zhong,Qi Wang,Rongjun Guo.Microecological mechanism behind the alleviation of common bean root rot disease following seven continuous cropping cycles[J].Soil Ecology Letters,2025,7(3):319-331.
5ZHANG Yamin.Analyzing the Similarities and Dfferences of the Imageof“Beans”in Walden and Back to Country Life[J].人文与社科亚太学刊,2025,5(3):81-96.

中国图象图形学报

2025年第10期

浏览历史

内容加载中请稍等...

结合视觉Mamba和块特征分布的工业异常检测

参考文献1

二级参考文献3

共引文献2

相关作者

相关机构

相关主题

浏览历史