Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure p...Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.展开更多
Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE...Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE)has been widely used to improve the model accuracy of soft sensors.However,with the increase of network layers,SAE may encounter serious information loss issues,which affect the modeling performance of soft sensors.Besides,there are typically very few labeled samples in the data set,which brings challenges to traditional neural networks to solve.In this paper,a multi-scale feature fused stacked autoencoder(MFF-SAE)is suggested for feature representation related to hierarchical output,where stacked autoencoder,mutual information(MI)and multi-scale feature fusion(MFF)strategies are integrated.Based on correlation analysis between output and input variables,critical hidden variables are extracted from the original variables in each autoencoder's input layer,which are correspondingly given varying weights.Besides,an integration strategy based on multi-scale feature fusion is adopted to mitigate the impact of information loss with the deepening of the network layers.Then,the MFF-SAE method is designed and stacked to form deep networks.Two practical industrial processes are utilized to evaluate the performance of MFF-SAE.Results from simulations indicate that in comparison to other cutting-edge techniques,the proposed method may considerably enhance the accuracy of soft sensor modeling,where the suggested method reduces the root mean square error(RMSE)by 71.8%,17.1%and 64.7%,15.1%,respectively.展开更多
To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network mo...To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.展开更多
Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells an...Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.展开更多
With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of...With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete v...Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete vessel segmentation and poor continuity.In this study,we propose CT-MFENet to address the aforementioned issues.First,the use of context transformer(CT)allows for the integration of contextual feature information,which helps establish the connection between pixels and solve the problem of incomplete vessel continuity.Second,multi-scale dense residual networks are used instead of traditional CNN to address the issue of inadequate local feature extraction when the model encounters vessels at multiple scales.In the decoding stage,we introduce a local-global fusion module.It enhances the localization of vascular information and reduces the semantic gap between high-and low-level features.To address the class imbalance in retinal images,we propose a hybrid loss function that enhances the segmentation ability of the model for topological structures.We conducted experiments on the publicly available DRIVE,CHASEDB1,STARE,and IOSTAR datasets.The experimental results show that our CT-MFENet performs better than most existing methods,including the baseline U-Net.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
Remote sensing plays a pivotal role in environmental monitoring,disaster relief,and urban planning,where accurate scene classification of aerial images is essential.However,conventional convolutional neural networks(C...Remote sensing plays a pivotal role in environmental monitoring,disaster relief,and urban planning,where accurate scene classification of aerial images is essential.However,conventional convolutional neural networks(CNNs)struggle with long-range dependencies and preserving high-resolution features,limiting their effectiveness in complex aerial image analysis.To address these challenges,we propose a Hybrid HRNet-Swin Transformer model that synergizes the strengths of HRNet-W48 for high-resolution segmentation and the Swin Transformer for global feature extraction.This hybrid architecture ensures robust multi-scale feature fusion,capturing fine-grained details and broader contextual relationships in aerial imagery.Our methodology begins with preprocessing steps,including normalization,histogram equalization,and noise reduction,to enhance input data quality.The HRNet-W48 backbone maintains high-resolution feature maps throughout the network,enabling precise segmentation,while the Swin Transformer leverages hierarchical self-attention to model long-range dependencies efficiently.By integrating these components,our model achieves superior performance in segmentation and classification tasks compared to traditional CNNs and standalone transformer models.We evaluate our approach on two benchmark datasets:UC Merced and WHU-RS19.Experimental results demonstrate that the proposed hybrid model outperforms existing methods,achieving state-of-the-art accuracy while maintaining computational efficiency.Specifically,it excels in preserving fine spatial details and contextual understanding,critical for applications like land-use classification and disaster assessment.展开更多
Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for ...Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract i...In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.展开更多
Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hier...Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hierarchical multi-scale attention feature fusion medical image classification network(HMAC-Net),which effectively combines global features and local features.The network framework consists of three parallel layers:The global feature extraction layer,the local feature extraction layer,and the multi-scale feature fusion layer.A linear sparse attention mechanism is designed in the global feature extraction layer to reduce information redundancy.In the local feature extraction layer,a bilateral local attention mechanism is introduced to improve the extraction of relevant information between adjacent slices.In the multi-scale feature fusion layer,a channel fusion block combining convolutional attention mechanism and residual inverse multi-layer perceptron is proposed to prevent gradient disappearance and network degradation and improve feature representation capability.The double-branch iterative multi-scale classification block is used to improve the classification performance.On the brain glioma risk grading dataset,the results of the ablation experiment and comparison experiment show that the proposed HMAC-Net has the best performance in both qualitative analysis of heat maps and quantitative analysis of evaluation indicators.On the dataset of skin cancer classification,the generalization experiment results show that the proposed HMAC-Net has a good generalization effect.展开更多
The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.Howeve...The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.展开更多
Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low a...Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.展开更多
With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)da...With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)data contains elevation information,joint use of them for ground object classification can yield positive results,especially by building deep networks.Fortu-nately,multi-scale deep networks allow to expand the receptive fields of convolution without causing the computational and training problems associated with simply adding more network layers.In this work,a multi-scale feature fusion network is proposed for the joint classification of HSI and LiDAR data.First,we design a multi-scale spatial feature extraction module with cross-channel connections,by which spatial information of HSI data and elevation information of LiDAR data are extracted and fused.In addition,a multi-scale spectral feature extraction module is employed to extract the multi-scale spectral features of HSI data.Finally,joint multi-scale features are obtained by weighting and concatenation operations and then fed into the classifier.To verify the effective-ness of the proposed network,experiments are carried out on the MUUFL Gulfport and Trento datasets.The experimental results demonstrate that the classification performance of the proposed method is superior to that of other state-of-the-art methods.展开更多
Feature extraction is essential to the classification of surface defect images. The defects of hot-rolled steels distribute in different directions. Therefore, the methods of multi-scale geometric analysis (MGA) wer...Feature extraction is essential to the classification of surface defect images. The defects of hot-rolled steels distribute in different directions. Therefore, the methods of multi-scale geometric analysis (MGA) were employed to decompose the image into several directional subba^ds at several scales. Then, the statistical features of each subband were calculated to produce a high-dimensional feature vector, which was reduced to a lower-dimensional vector by graph embedding algorithms. Finally, support vector machine (SVM) was used for defect classification. The multi-scale feature extraction method was implemented via curvelet transform and kernel locality preserving projections (KLPP). Experiment results show that the proposed method is effective for classifying the surface defects of hot-rolled steels and the total classification rate is up to 97.33%.展开更多
Effective bearing fault diagnosis is vital for the safe and reliable operation of rotating machinery.In practical applications,bearings often work at various rotational speeds as well as load conditions.Yet,the bearin...Effective bearing fault diagnosis is vital for the safe and reliable operation of rotating machinery.In practical applications,bearings often work at various rotational speeds as well as load conditions.Yet,the bearing fault diagnosis under multiple conditions is a new subject,which needs to be further explored.Therefore,a multi-scale deep belief network(DBN)method integrated with attention mechanism is proposed for the purpose of extracting the multi-scale core features from vibration signals,containing four primary steps:preprocessing of multi-scale data,feature extraction,feature fusion,and fault classification.The key novelties include multi-scale feature extraction using multi-scale DBN algorithm,and feature fusion using attention mecha-nism.The benchmark dataset from University of Ottawa is applied to validate the effectiveness as well as advantages of this method.Furthermore,the aforementioned method is compared with four classical fault diagnosis methods reported in the literature,and the comparison results show that our pro-posed method has higher diagnostic accuracy and better robustness.展开更多
Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vi...Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
基金supported by the National Natural Science Foundation of China(No.62376287)the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Natural Science Foundation of Hunan Province(Nos.2022JJ30762,2023JJ70016).
文摘Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.
基金supported by the National Key Research and Development Program of China(2023YFB3307800)National Natural Science Foundation of China(62394343,62373155)+2 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024A26)Fundamental Research Funds for the Central Universities.
文摘Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE)has been widely used to improve the model accuracy of soft sensors.However,with the increase of network layers,SAE may encounter serious information loss issues,which affect the modeling performance of soft sensors.Besides,there are typically very few labeled samples in the data set,which brings challenges to traditional neural networks to solve.In this paper,a multi-scale feature fused stacked autoencoder(MFF-SAE)is suggested for feature representation related to hierarchical output,where stacked autoencoder,mutual information(MI)and multi-scale feature fusion(MFF)strategies are integrated.Based on correlation analysis between output and input variables,critical hidden variables are extracted from the original variables in each autoencoder's input layer,which are correspondingly given varying weights.Besides,an integration strategy based on multi-scale feature fusion is adopted to mitigate the impact of information loss with the deepening of the network layers.Then,the MFF-SAE method is designed and stacked to form deep networks.Two practical industrial processes are utilized to evaluate the performance of MFF-SAE.Results from simulations indicate that in comparison to other cutting-edge techniques,the proposed method may considerably enhance the accuracy of soft sensor modeling,where the suggested method reduces the root mean square error(RMSE)by 71.8%,17.1%and 64.7%,15.1%,respectively.
基金supported by National Natural Science Foundation of China(No.61862037)Lanzhou Jiaotong University Tianyou Innovation Team Project(No.TY202002)。
文摘To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers 2024TIAD-CYKJCXX0121,2024NSCQ-LZX0135Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology graduate education high-quality development project,grant number gzlsz202401the Chongqing University of Technology-Chongqing LINGLUE Technology Co.,Ltd.,Electronic Information(Artificial Intelligence)graduate joint training basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.,Computer Technology graduate joint training base.
文摘Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.
基金supported by Communication University of China(HG23035)partly supported by the Fundamental Research Funds for the Central Universities(CUC230A013).
文摘With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
基金the National Natural Science Foundation of China(No.62266025)。
文摘Segmentation of the retinal vessels in the fundus is crucial for diagnosing ocular diseases.Retinal vessel images often suffer from category imbalance and large scale variations.This ultimately results in incomplete vessel segmentation and poor continuity.In this study,we propose CT-MFENet to address the aforementioned issues.First,the use of context transformer(CT)allows for the integration of contextual feature information,which helps establish the connection between pixels and solve the problem of incomplete vessel continuity.Second,multi-scale dense residual networks are used instead of traditional CNN to address the issue of inadequate local feature extraction when the model encounters vessels at multiple scales.In the decoding stage,we introduce a local-global fusion module.It enhances the localization of vascular information and reduces the semantic gap between high-and low-level features.To address the class imbalance in retinal images,we propose a hybrid loss function that enhances the segmentation ability of the model for topological structures.We conducted experiments on the publicly available DRIVE,CHASEDB1,STARE,and IOSTAR datasets.The experimental results show that our CT-MFENet performs better than most existing methods,including the baseline U-Net.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金supported by the ITP(Institute of Information&Communications Technology Planning&Evaluation)-ICAN(ICT Challenge and Advanced Network of HRD)(ITP-2025-RS-2022-00156326,33)grant funded by the Korea government(Ministry of Science and ICT)the Deanship of Research and Graduate Studies at King Khalid University for funding this work through the Large Group Project under grant number(RGP2/568/45)the Deanship of Scientific Research at Northern Border University,Arar,Saudi Arabia,for funding this research work through the Project Number"NBU-FFR-2025-231-03".
文摘Remote sensing plays a pivotal role in environmental monitoring,disaster relief,and urban planning,where accurate scene classification of aerial images is essential.However,conventional convolutional neural networks(CNNs)struggle with long-range dependencies and preserving high-resolution features,limiting their effectiveness in complex aerial image analysis.To address these challenges,we propose a Hybrid HRNet-Swin Transformer model that synergizes the strengths of HRNet-W48 for high-resolution segmentation and the Swin Transformer for global feature extraction.This hybrid architecture ensures robust multi-scale feature fusion,capturing fine-grained details and broader contextual relationships in aerial imagery.Our methodology begins with preprocessing steps,including normalization,histogram equalization,and noise reduction,to enhance input data quality.The HRNet-W48 backbone maintains high-resolution feature maps throughout the network,enabling precise segmentation,while the Swin Transformer leverages hierarchical self-attention to model long-range dependencies efficiently.By integrating these components,our model achieves superior performance in segmentation and classification tasks compared to traditional CNNs and standalone transformer models.We evaluate our approach on two benchmark datasets:UC Merced and WHU-RS19.Experimental results demonstrate that the proposed hybrid model outperforms existing methods,achieving state-of-the-art accuracy while maintaining computational efficiency.Specifically,it excels in preserving fine spatial details and contextual understanding,critical for applications like land-use classification and disaster assessment.
基金suported by the Fundamental Research Grant Scheme(FRGS)of Universiti Sains Malaysia,Research Number:FRGS/1/2024/ICT02/USM/02/1.
文摘Classroom behavior recognition is a hot research topic,which plays a vital role in assessing and improving the quality of classroom teaching.However,existing classroom behavior recognition methods have challenges for high recognition accuracy with datasets with problems such as scenes with blurred pictures,and inconsistent objects.To address this challenge,we proposed an effective,lightweight object detector method called the RFNet model(YOLO-FR).The YOLO-FR is a lightweight and effective model.Specifically,for efficient multi-scale feature extraction,effective feature pyramid shared convolutional(FPSC)was designed to improve the feature extract performance by leveraging convolutional layers with varying dilation rates from the input image in the backbone.Secondly,to address the problem of multi-scale variability in the scene,we design the Rep Ghost fusion Cross Stage Partial and Efficient Layer Aggregation Network(RGCSPELAN)to improve the network performance further and reduce the amount of computation and the number of parameters.In addition,by conducting experimental valuation on the SCB dataset3 and STBD-08 dataset.Experimental results indicate that,compared to the baseline model,the RFNet model has increased mean accuracy precision(mAP@50)from 69.6%to 71.0%on the SCB dataset3 and from 91.8%to 93.1%on the STBD-08 dataset.The RFNet approach has effectiveness precision at 68.6%,surpassing the baseline method(YOLOv11)at 3.3%and archieve the minimal size(4.9 M)on the SCB dataset3.Finally,comparing it with other algorithms,it accurately detects student behavior in complex classroom environments results confirmed that RFNet is well-suited for real-time and efficiently recognizing classroom behaviors.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金the Scientific Research Foundation of Liaoning Provincial Department of Education(No.LJKZ0139)the Program for Liaoning Excellent Talents in University(No.LR15045).
文摘In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.
基金Major Program of National Natural Science Foundation of China(NSFC12292980,NSFC12292984)National Key R&D Program of China(2023YFA1009000,2023YFA1009004,2020YFA0712203,2020YFA0712201)+2 种基金Major Program of National Natural Science Foundation of China(NSFC12031016)Beijing Natural Science Foundation(BNSFZ210003)Department of Science,Technology and Information of the Ministry of Education(8091B042240).
文摘Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hierarchical multi-scale attention feature fusion medical image classification network(HMAC-Net),which effectively combines global features and local features.The network framework consists of three parallel layers:The global feature extraction layer,the local feature extraction layer,and the multi-scale feature fusion layer.A linear sparse attention mechanism is designed in the global feature extraction layer to reduce information redundancy.In the local feature extraction layer,a bilateral local attention mechanism is introduced to improve the extraction of relevant information between adjacent slices.In the multi-scale feature fusion layer,a channel fusion block combining convolutional attention mechanism and residual inverse multi-layer perceptron is proposed to prevent gradient disappearance and network degradation and improve feature representation capability.The double-branch iterative multi-scale classification block is used to improve the classification performance.On the brain glioma risk grading dataset,the results of the ablation experiment and comparison experiment show that the proposed HMAC-Net has the best performance in both qualitative analysis of heat maps and quantitative analysis of evaluation indicators.On the dataset of skin cancer classification,the generalization experiment results show that the proposed HMAC-Net has a good generalization effect.
基金This work was funded by the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.
基金funded by the National Natural Foundation of China under Grant No.61172167the Science Fund Project of Heilongjiang Province(LH2020F035).
文摘Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.
基金supported by the National Key Research and Development Project(No.2020YFC1512000)the General Projects of Key R&D Programs in Shaanxi Province(No.2020GY-060)Xi’an Science&Technology Project(No.2020KJRC 0126)。
文摘With the development of sensors,the application of multi-source remote sensing data has been widely concerned.Since hyperspectral image(HSI)contains rich spectral information while light detection and ranging(LiDAR)data contains elevation information,joint use of them for ground object classification can yield positive results,especially by building deep networks.Fortu-nately,multi-scale deep networks allow to expand the receptive fields of convolution without causing the computational and training problems associated with simply adding more network layers.In this work,a multi-scale feature fusion network is proposed for the joint classification of HSI and LiDAR data.First,we design a multi-scale spatial feature extraction module with cross-channel connections,by which spatial information of HSI data and elevation information of LiDAR data are extracted and fused.In addition,a multi-scale spectral feature extraction module is employed to extract the multi-scale spectral features of HSI data.Finally,joint multi-scale features are obtained by weighting and concatenation operations and then fed into the classifier.To verify the effective-ness of the proposed network,experiments are carried out on the MUUFL Gulfport and Trento datasets.The experimental results demonstrate that the classification performance of the proposed method is superior to that of other state-of-the-art methods.
基金supports by the Program for New Century Excellent Talents in Chinese Universities (No.NCET-08-0726)Beijing Nova Program (No. 2007B027)the Fundamental Research Funds for the Central Universities (No. FRF-TP-09-027B)
文摘Feature extraction is essential to the classification of surface defect images. The defects of hot-rolled steels distribute in different directions. Therefore, the methods of multi-scale geometric analysis (MGA) were employed to decompose the image into several directional subba^ds at several scales. Then, the statistical features of each subband were calculated to produce a high-dimensional feature vector, which was reduced to a lower-dimensional vector by graph embedding algorithms. Finally, support vector machine (SVM) was used for defect classification. The multi-scale feature extraction method was implemented via curvelet transform and kernel locality preserving projections (KLPP). Experiment results show that the proposed method is effective for classifying the surface defects of hot-rolled steels and the total classification rate is up to 97.33%.
基金supported by the National Natural Science Foundation of China(62020106003,61873122,62303217)Aero Engine Corporation of China Industry-university-research Cooperation Project(HFZL2020CXY011)the Research Fund of State Key Laboratory of Mechanics and Control of Mechanical Structures(Nanjing University of Aeronautics and Astronautics)(MCMS-I-0121G03).
文摘Effective bearing fault diagnosis is vital for the safe and reliable operation of rotating machinery.In practical applications,bearings often work at various rotational speeds as well as load conditions.Yet,the bearing fault diagnosis under multiple conditions is a new subject,which needs to be further explored.Therefore,a multi-scale deep belief network(DBN)method integrated with attention mechanism is proposed for the purpose of extracting the multi-scale core features from vibration signals,containing four primary steps:preprocessing of multi-scale data,feature extraction,feature fusion,and fault classification.The key novelties include multi-scale feature extraction using multi-scale DBN algorithm,and feature fusion using attention mecha-nism.The benchmark dataset from University of Ottawa is applied to validate the effectiveness as well as advantages of this method.Furthermore,the aforementioned method is compared with four classical fault diagnosis methods reported in the literature,and the comparison results show that our pro-posed method has higher diagnostic accuracy and better robustness.
基金This work was supported by the High-Tech Industry Science and Technology Innovation Leading Plan Project of Hunan Provincial under Grant 2020GK2026,author B.Y,http://kjt.hunan.gov.cn/.
文摘Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.