期刊文献+

结合视觉Mamba和块特征分布的工业异常检测

Industrial anomaly detection by combining visual Mamba and patch feature distribution
原文传递
导出
摘要 目的工业异常检测在现代工业生产中具有至关重要的作用,现有的工业异常检测方法主要是基于卷积神经网络(convolutional neural network,CNN)或视觉变换器(vision Transformer,ViT)网络来实现。然而,CNN存在难以处理长距离依赖关系的不足,而ViT又面临时间复杂度高的问题。基于此,提出一种结合视觉Mamba和块特征分布的无监督工业异常检测模型。方法该模型包含两个互补分支网络:块特征分布估计网络和基于视觉Mamba的自编码重建网络。块特征分布估计网络主要依赖局部块特征进行异常检测,通过融合高效的预训练块特征描述网络以及视觉Mamba编码器提取的正常样本的块特征,学习一个高斯混合密度网络来估计正常样本局部块特征的分布。在测试阶段利用高斯混合密度网络估计异常图像的各个位置的异常得分,从而得到一个局部异常得分图(local anomaly map,LAM);基于视觉Mamba的自编码重构网络则利用视觉Mamba编码器来捕捉长距离关联特征,增强对跨不同类别和形态的复杂异常图像的全局建模能力,在测试阶段利用重建误差估计异常图像的全局异常得分图(global anomaly map,GAM);最后,合并LAM和GAM得到最终检测结果。结果在MvTec AD(MvTec anomay detection dataset)、VisA和BTAD(bean tech anomaly detection)等公开数据集上与其他先进算法进行了比较,取得了有竞争力的结果。在MvTec AD数据集上所提模型相比性能第2的模型在像素级上AU-ROC(area under the receiver operating characteristic curve)指标提升了0.9%,在图像级上AU-ROC指标提升了2.4%。在BTAD数据集所提模型相比性能第2的模型在图像级上AU-ROC提升0.4%。在VisA数据集上模型相比性能第2的模型在像素级上AU-ROC指标提升了0.6%。结论将视觉状态空间用于图像重建检测图像异常是可行的,检测效果具有竞争力。 Objective Industrial image anomaly detection plays a crucial role in modern industrial production because it can timely detect defects in products,effectively improve product qualification rate,enhance industrial productivity,and reduce production costs.Traditional anomaly detection algorithms often show certain limitations when dealing with new types of anomalies,especially complex issues such as logical anomalies.Thus,they have difficulty meeting the demand for high-precision and efficient detection in industrial production.Therefore,this study is committed to exploring the potential application of visual state space in the field of image processing and anomaly detection.The aim is to find a more effective method for addressing the shortcomings of traditional algorithms in detecting new types of anomalies,especially the limitations in handling logical anomalies.The reconstruction-based method is considered capable of addressing logical anomalies caused by factors such as object quantity,structure,position,and arrangement order because using only normal images to train the model will result in significant errors in the reconstructed output compared with images containing logical anomalies.Existing reconstruction-based anomaly detection methods are mainly based on convolutional neural networks(CNNs)or vision Transformer(ViT)networks.However,CNN exhibits difficulty in handling long-distance dependencies,while ViT presents high time complexity.The latest research shows that state space models represented by Mamba can effectively model long dependencies while maintaining linear complexity.We have explored the potential application of visual state space in anomaly detection and aspire to develop a more precise and efficient image anomaly detection technology by leveraging its advantages to meet strict quality control requirements in industrial production.This endeavor will drive industrial production toward intelligent automation direction while improving overall efficiency and competitiveness.Method A novel unsupervised industrial anomaly detection model combining visual Mamba and patch feature distribution is proposed.This model consists of two complementary branch networks:a patch feature distribution estimation network and a self-encoding reconstruction network based on visual Mamba.The patch feature distribution estimation network primarily relies on local patch features for anomaly detection.It fuses local patch features of normal samples through the Vision Mamba encoder and pretrained efficient patch description network and learns a Gaussian mixture density network to estimate the distribution of these features.During the testing phase,this Gaussian mixture density network is used to estimate anomaly scores at various positions in the anomalous images,which produces a local anomaly map(LAM).Meanwhile,the self-encoding reconstruction network based on visual Mamba utilizes a visual Mamba encoder to capture long-range associated features,which enhances the global modeling capability for complex anomaly detection across different categories and forms.In the testing phase,reconstruction errors are used to estimate a global anomaly map(GAM)for the anomalous images.Finally,LAM and GAM are combined to obtain the final detection results.For the dataset,we conducted detailed preprocessing and clipped the images to appropriate sizes according to the requirements of different models.For example,the size of the input image was 256×256 pixels.We carefully adjusted the number of coding blocks in the encoder of the visual state space in the reconstruction method to achieve the best anomaly detection performance and maximize the overall performance of the model.The experiments in this study were conducted on a desktop computer equipped with an Intel Core i5,2.5 GHz CPU,GeForce GTX 3060Ti GPU with 12 GB memory,32 GB RAM,and Ubuntu18.04 as the operating system.According to our experimental observations,we set the learning rate to 0.001,configured the model to run for 200 epochs,and determined a batch size of 48.Regarding the selection of image blocks,in the PDN method combined with Patchsize,we chose a value of 32.Result We compared our model with other advanced algorithms on publicly available datasets such as MvTec AD,VisA,and BTAD,and our model demonstrated highly competitive performance.On the MvTec AD dataset,our model improved the pixel-level AU-ROC metric by 0.9%to reach 93.9%,and the image-level AU-ROC metric by 2.4%to reach 93.8%,compared with the second-best performing model.On the BTAD dataset,our model achieved a 0.4%improvement in image-level AU-ROC(reaching 92.6%)compared with the second-best performing model.On the VisA dataset,our model achieved a 0.6%improvement in pixel-level AU-ROC(reaching 96.6%)compared with the second-best performing model.According to visualizations of anomaly localization in our study on MvTec and VisA datasets,the anomaly localization of our model is more accurate than those of other models.Conclusion The application of visual state space to image reconstruction for detecting image anomalies is a feasible and effective method,and its anomaly localization effect has significant competitiveness.This study believes that aggregating features in the middle of the extraction model will be more helpful for adapting to anomaly detection tasks.The setting of the number of image block vectors may be helpful for the localization and detection of anomalies because more image block descriptor vectors can represent more detailed information.The two points are worth further research in the future.This study organically combines two popular methods in the industrial anomaly detection field while integrating visual state space into the model,which supports its application in the field of anomaly detection.
作者 刘建明 庄维宽 Liu Jianming;Zhuang Weikuan(School of Digital Industry,Jiangxi Normal University,Shangrao 334000,China;School of Computer and Information Engineering,Jiangxi Normal University,Nanchang 330000,China)
出处 《中国图象图形学报》 北大核心 2025年第10期3215-3229,共15页 Journal of Image and Graphics
基金 国家自然科学基金项目(62266022) 江西省自然科学基金项目(20242BAB25110)。
关键词 异常检测 异常分割 视觉状态空间模型(SSM) 高斯密度混合网络 异常数据集 anomaly detection anomaly segmentation vision state space model(SSM) Gaussian density approximation network anomalydataset
  • 相关文献

参考文献1

二级参考文献3

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部