To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we des...To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we design three modules to ensure that salient features of small targets can be acquired and retained in the multi-scale feature maps.To improve the adaptability of the network for targets of different sizes,we design a kernel aggregation attention block with a receptive field attention branch and weight the feature maps under different perceptual fields with attention mechanism.Based on the research on human vision system,we further propose an adaptive local contrast measure module to enhance the local features of infrared small targets.With this parameterized component,we can implement the information aggregation of multi-scale contrast saliency maps.Finally,to fully utilize the information within spatial and channel domains in feature maps of different scales,we propose the mixed spatial-channel attention-guided fusion module to achieve high-quality fusion effects while ensuring that the small target features can be preserved at deep layers.Experiments on public datasets demonstrate that our MAGPNet can achieve a better performance over other state-of-the-art methods in terms of the intersection of union,Precision,Recall,and F-measure.In addition,we conduct detailed ablation studies to verify the effectiveness of each component in our network.展开更多
Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dyn...Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.展开更多
The current deep convolution features based on retrievalmethods cannot fully use the characteristics of the salient image regions.Also,they cannot effectively suppress the background noises,so it is a challenging task...The current deep convolution features based on retrievalmethods cannot fully use the characteristics of the salient image regions.Also,they cannot effectively suppress the background noises,so it is a challenging task to retrieve objects in cluttered scenarios.To solve the problem,we propose a new image retrieval method that employs a novel feature aggregation approach with an attention mechanism and utilizes a combination of local and global features.The method first extracts global and local features of the input image and then selects keypoints from local features by using the attention mechanism.After that,the feature aggregation mechanism aggregates the keypoints to a compact vector representation according to the scores evaluated by the attention mechanism.The core of the aggregation mechanism is to allow features with high scores to participate in residual operations of all cluster centers.Finally,we get the improved image representation by fusing aggregated feature descriptor and global feature of the input image.To effectively evaluate the proposedmethod,we have carried out a series of experiments on large-scale image datasets and compared them with other state-of-the-art methods.Experiments show that this method greatly improves the precision of image retrieval and computational efficiency.展开更多
Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool i...Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool in this domain.However,existing deep learning models often struggle to generalise in the presence of class imbalance and subtle morphological differences,which are common in fungal image datasets.This study proposes MASA-Net,a deep learning framework that combines a fine-tuned DenseNet201 backbone with a multi-aspect channel-spatial attention(MASA)module.The attention mechanism refines spatial and channel-wise features by capturing multi-scale spatial patterns and adaptively emphasising informative channels.This enhances the network's ability to focus on diagnostically relevant fungal structures while suppressing irrelevant features.The MASA-Net is evaluated on the DeFungi dataset and demonstrates superior performance in terms of accuracy,precision,recall and F1-score.It also outperforms established attention mechanisms such as squeeze-andexcitation networks(SE)and convolutional block attention module(CBAM)under identical conditions.These results highlight MASA-Net's robustness and effectiveness in addressing class imbalance and structural variability,offering a reliable solution for automated fungal species identification.展开更多
基金the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation(No.USCAST2021-5)。
文摘To overcome the obstacles of poor feature extraction and little prior information on the appearance of infrared dim small targets,we propose a multi-domain attention-guided pyramid network(MAGPNet).Specifically,we design three modules to ensure that salient features of small targets can be acquired and retained in the multi-scale feature maps.To improve the adaptability of the network for targets of different sizes,we design a kernel aggregation attention block with a receptive field attention branch and weight the feature maps under different perceptual fields with attention mechanism.Based on the research on human vision system,we further propose an adaptive local contrast measure module to enhance the local features of infrared small targets.With this parameterized component,we can implement the information aggregation of multi-scale contrast saliency maps.Finally,to fully utilize the information within spatial and channel domains in feature maps of different scales,we propose the mixed spatial-channel attention-guided fusion module to achieve high-quality fusion effects while ensuring that the small target features can be preserved at deep layers.Experiments on public datasets demonstrate that our MAGPNet can achieve a better performance over other state-of-the-art methods in terms of the intersection of union,Precision,Recall,and F-measure.In addition,we conduct detailed ablation studies to verify the effectiveness of each component in our network.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Provincial Key Laboratory of Virtual Reality under Grant No.2024SSY03151.
文摘Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.
基金This research is jointly supported by the National Natural Science Foundation of China(62072414,U1504608,61975187)the Foundation and Cutting-Edge Technologies Research Program of Henan Province(212102210540,192102210294,212102210280).
文摘The current deep convolution features based on retrievalmethods cannot fully use the characteristics of the salient image regions.Also,they cannot effectively suppress the background noises,so it is a challenging task to retrieve objects in cluttered scenarios.To solve the problem,we propose a new image retrieval method that employs a novel feature aggregation approach with an attention mechanism and utilizes a combination of local and global features.The method first extracts global and local features of the input image and then selects keypoints from local features by using the attention mechanism.After that,the feature aggregation mechanism aggregates the keypoints to a compact vector representation according to the scores evaluated by the attention mechanism.The core of the aggregation mechanism is to allow features with high scores to participate in residual operations of all cluster centers.Finally,we get the improved image representation by fusing aggregated feature descriptor and global feature of the input image.To effectively evaluate the proposedmethod,we have carried out a series of experiments on large-scale image datasets and compared them with other state-of-the-art methods.Experiments show that this method greatly improves the precision of image retrieval and computational efficiency.
文摘Accurate identification of fungal species is essential for effective diagnosis and treatment.Traditional microscopy-based methods are often subjective and time-consuming.Deep learning has emerged as a promising tool in this domain.However,existing deep learning models often struggle to generalise in the presence of class imbalance and subtle morphological differences,which are common in fungal image datasets.This study proposes MASA-Net,a deep learning framework that combines a fine-tuned DenseNet201 backbone with a multi-aspect channel-spatial attention(MASA)module.The attention mechanism refines spatial and channel-wise features by capturing multi-scale spatial patterns and adaptively emphasising informative channels.This enhances the network's ability to focus on diagnostically relevant fungal structures while suppressing irrelevant features.The MASA-Net is evaluated on the DeFungi dataset and demonstrates superior performance in terms of accuracy,precision,recall and F1-score.It also outperforms established attention mechanisms such as squeeze-andexcitation networks(SE)and convolutional block attention module(CBAM)under identical conditions.These results highlight MASA-Net's robustness and effectiveness in addressing class imbalance and structural variability,offering a reliable solution for automated fungal species identification.