Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems.Previous approaches have primarily utilized convolutional neural networks(CNNs)to effectively integrate spa...Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems.Previous approaches have primarily utilized convolutional neural networks(CNNs)to effectively integrate spatial and semantic information.However,the performance of CNN-based methods remains limited in classification accuracy,primarily due to insufficient exploration of local image characteristics.Unlike CNNs,Vision Transformer(ViT)captures discriminative features by modeling relationships between local image patches.However,such methods typically require a large number of training samples to perform effectively.In the context of foreign body classification on coal conveyor belts,the limited availability of training samples hinders the full exploitation of Vision Transformer’s(ViT)capabilities.To address this issue,we propose an efficient approach,termed Key Part-level Attention Vision Transformer(KPA-ViT),which incorporates key local information into the transformer architecture to enrich the training information.It comprises three main components:a key-point detection module,a key local mining module,and an attention module.To extract key local regions,a key-point detection strategy is first employed to identify the positions of key points.Subsequently,the key local mining module extracts the relevant local features based on these detected points.Finally,an attention module composed of self-attention and cross-attention blocks is introduced to integrate global and key part-level information,thereby enhancing the model’s ability to learn discriminative features.Compared to recent transformer-based frameworks—such as ViT,Swin-Transformer,and EfficientViT—the proposed KPA-ViT achieves performance improvements of 9.3%,6.6%,and 2.8%,respectively,on the CUMT-BelT dataset,demonstrating its effectiveness.展开更多
基金funded by the National Key Research and Development Program of China(grant number 2023YFC2907600)the National Natural Science Foundation of China(grant number 52504132)Tiandi Science and Technology Co.,Ltd.Science and Technology Innovation Venture Capital Special Project(grant number 2023-TD-ZD011-004).
文摘Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems.Previous approaches have primarily utilized convolutional neural networks(CNNs)to effectively integrate spatial and semantic information.However,the performance of CNN-based methods remains limited in classification accuracy,primarily due to insufficient exploration of local image characteristics.Unlike CNNs,Vision Transformer(ViT)captures discriminative features by modeling relationships between local image patches.However,such methods typically require a large number of training samples to perform effectively.In the context of foreign body classification on coal conveyor belts,the limited availability of training samples hinders the full exploitation of Vision Transformer’s(ViT)capabilities.To address this issue,we propose an efficient approach,termed Key Part-level Attention Vision Transformer(KPA-ViT),which incorporates key local information into the transformer architecture to enrich the training information.It comprises three main components:a key-point detection module,a key local mining module,and an attention module.To extract key local regions,a key-point detection strategy is first employed to identify the positions of key points.Subsequently,the key local mining module extracts the relevant local features based on these detected points.Finally,an attention module composed of self-attention and cross-attention blocks is introduced to integrate global and key part-level information,thereby enhancing the model’s ability to learn discriminative features.Compared to recent transformer-based frameworks—such as ViT,Swin-Transformer,and EfficientViT—the proposed KPA-ViT achieves performance improvements of 9.3%,6.6%,and 2.8%,respectively,on the CUMT-BelT dataset,demonstrating its effectiveness.