Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure p...Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.展开更多
Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells an...Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
Ground vibration events represent a critical challenge to operational safety and risk management in mining engineering.Reliable estimation of ground shaking intensity serves as a key prerequisite for effective seismic...Ground vibration events represent a critical challenge to operational safety and risk management in mining engineering.Reliable estimation of ground shaking intensity serves as a key prerequisite for effective seismic risk mitigation in mining environments.This study investigates the application of intensity classification techniques to mining scenarios and introduces a deep learning-based Multi-Scale Feature Fusion Classification Network.The model utilizes waveform data within two seconds following P-wave onset to enable rapid and accurate classification of seismic intensity levels.With an intensity threshold of 4.5 used to differentiate low-and high-intensity events,the model achieves a classification accuracy of 90.27%on the test set.Particularly,it surpasses 99%accuracy for samples with intensity levels≤3 or≥6,demonstrating a substantial advantage over traditional approaches that require complete waveform sequences and exhibit limited processing efficiency.Notably,the model maintains over 85%accuracy under-5dB SNR conditions,outperforming state-of-the-art methods in noise resilience,while reducing computational complexity by 78%compared to Transformer-based architectures.The proposed method supports real-time seismic monitoring in mining areas,facilitating rapid assessment of risk levels and providing critical decision support for operational adjustments and emergency response.These capabilities contribute to enhanced seismic resilience and the promotion of safe mining practices.展开更多
To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network mo...To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.展开更多
With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of...With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it cha...Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.展开更多
Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely us...Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.展开更多
Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE...Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE)has been widely used to improve the model accuracy of soft sensors.However,with the increase of network layers,SAE may encounter serious information loss issues,which affect the modeling performance of soft sensors.Besides,there are typically very few labeled samples in the data set,which brings challenges to traditional neural networks to solve.In this paper,a multi-scale feature fused stacked autoencoder(MFF-SAE)is suggested for feature representation related to hierarchical output,where stacked autoencoder,mutual information(MI)and multi-scale feature fusion(MFF)strategies are integrated.Based on correlation analysis between output and input variables,critical hidden variables are extracted from the original variables in each autoencoder's input layer,which are correspondingly given varying weights.Besides,an integration strategy based on multi-scale feature fusion is adopted to mitigate the impact of information loss with the deepening of the network layers.Then,the MFF-SAE method is designed and stacked to form deep networks.Two practical industrial processes are utilized to evaluate the performance of MFF-SAE.Results from simulations indicate that in comparison to other cutting-edge techniques,the proposed method may considerably enhance the accuracy of soft sensor modeling,where the suggested method reduces the root mean square error(RMSE)by 71.8%,17.1%and 64.7%,15.1%,respectively.展开更多
The capacity to diagnose faults in rolling bearings is of significant practical importance to ensure the normal operation of the equipment.Frequency-domain features can effectively enhance the identification of fault ...The capacity to diagnose faults in rolling bearings is of significant practical importance to ensure the normal operation of the equipment.Frequency-domain features can effectively enhance the identification of fault modes.However,existing methods often suffer from insufficient frequency-domain representation in practical applications,which greatly affects diagnostic performance.Therefore,this paper proposes a rolling bearing fault diagnosismethod based on aMulti-Scale FusionNetwork(MSFN)using the Time-Division Fourier Transform(TDFT).The method constructs multi-scale channels to extract time-domain and frequency-domain features of the signal in parallel.A multi-level,multi-scale filter-based approach is designed to extract frequency-domain features in a segmented manner.A cross-attention mechanism is introduced to facilitate the fusion of the extracted time-frequency domain features.The performance of the proposed method is validated using the CWRU and Ottawa datasets.The results show that the average accuracy of MSFN under complex noisy signals is 97.75%and 94.41%.The average accuracy under variable load conditions is 98.68%.This demonstrates its significant application potential compared to existing methods.展开更多
The accurate state of health(SOH)estimation of lithium-ion batteries is crucial for efficient,healthy,and safe operation of battery systems.Extracting meaningful aging information from highly stochastic and noisy data...The accurate state of health(SOH)estimation of lithium-ion batteries is crucial for efficient,healthy,and safe operation of battery systems.Extracting meaningful aging information from highly stochastic and noisy data segments while designing SOH estimation algorithms that efficiently handle the large-scale computational demands of cloud-based battery management systems presents a substantial challenge.In this work,we propose a quantum convolutional neural network(QCNN)model designed for accurate,robust,and generalizable SOH estimation with minimal data and parameter requirements and is compatible with quantum computing cloud platforms in the Noisy Intermediate-Scale Quantum.First,we utilize data from 4 datasets comprising 272 cells,covering 5 chemical compositions,4 rated parameters,and 73operating conditions.We design 5 voltage windows as small as 0.3 V for each cell from incremental capacity peaks for stochastic SOH estimation scenarios generation.We extract 3 effective health indicators(HIs)sequences and develop an automated feature fusion method using quantum rotation gate encoding,achieving an R2of 96%.Subsequently,we design a QCNN whose convolutional layer,constructed with variational quantum circuits,comprises merely 39 parameters.Additionally,we explore the impact of training set size,using strategies,and battery materials on the model’s accuracy.Finally,the QCNN with quantum convolutional layers reduces root mean squared error by 28% and achieves an R^(2)exceeding 96% compared to other three commonly used algorithms.This work demonstrates the effectiveness of quantum encoding for automated feature fusion of HIs extracted from limited discharge data.It highlights the potential of QCNN in improving the accuracy,robustness,and generalization of SOH estimation while dealing with stochastic and noisy data with few parameters and simple structure.It also suggests a new paradigm for leveraging quantum computational power in SOH estimation.展开更多
An intelligent diagnosis method based on self-adaptiveWasserstein dual generative adversarial networks and feature fusion is proposed due to problems such as insufficient sample size and incomplete fault feature extra...An intelligent diagnosis method based on self-adaptiveWasserstein dual generative adversarial networks and feature fusion is proposed due to problems such as insufficient sample size and incomplete fault feature extraction,which are commonly faced by rolling bearings and lead to low diagnostic accuracy.Initially,dual models of the Wasserstein deep convolutional generative adversarial network incorporating gradient penalty(1D-2DWDCGAN)are constructed to augment the original dataset.A self-adaptive loss threshold control training strategy is introduced,and establishing a self-adaptive balancing mechanism for stable model training.Subsequently,a diagnostic model based on multidimensional feature fusion is designed,wherein complex features from various dimensions are extracted,merging the original signal waveform features,structured features,and time-frequency features into a deep composite feature representation that encompasses multiple dimensions and scales;thus,efficient and accurate small sample fault diagnosis is facilitated.Finally,an experiment between the bearing fault dataset of CaseWestern ReserveUniversity and the fault simulation experimental platformdataset of this research group shows that this method effectively supplements the dataset and remarkably improves the diagnostic accuracy.The diagnostic accuracy after data augmentation reached 99.94%and 99.87%in two different experimental environments,respectively.In addition,robustness analysis is conducted on the diagnostic accuracy of the proposed method under different noise backgrounds,verifying its good generalization performance.展开更多
The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.Howeve...The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.展开更多
In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract i...In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutiona...Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutional neural network(1DCNN)architectures to enhance ransomware detection capabilities.Addressing common challenges in ransomware detection,particularly dataset class imbalance,the synthetic minority oversampling technique(SMOTE)is employed to generate synthetic samples for minority class,thereby improving detection accuracy.The integration of ViT and 1DCNN through feature fusion enables the model to capture both global contextual and local sequential features,resulting in comprehensive ransomware classification.Tested on the UNSW-NB15 dataset,the proposed ViT-1DCNN model achieved 98%detection accuracy with precision,recall,and F1-score metrics surpassing conventional methods.This approach not only reduces false positives and negatives but also offers scalability and robustness for real-world cybersecurity applications.The results demonstrate the model’s potential as an effective tool for proactive ransomware detection,especially in environments where evolving threats require adaptable and high-accuracy solutions.展开更多
In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted fe...In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted features such asMel filters,often suffer frominformation loss and limited feature representation capabilities.To address these limitations,this study proposes an innovative end-to-end audio pattern recognition framework that directly processes raw audio signals,preserving original information and extracting effective classification features.The proposed framework utilizes a dual-branch architecture:a global refinement module that retains channel and temporal details and a multi-scale embedding module that captures high-level semantic information.Additionally,a guided fusion module integrates complementary features from both branches,ensuring a comprehensive representation of audio data.Specifically,the multi-scale audio context embedding module is designed to effectively extract spatiotemporal dependencies,while the global refinement module aggregates multi-scale channel and temporal cues for enhanced modeling.The guided fusion module leverages these features to achieve efficient integration of complementary information,resulting in improved classification accuracy.Experimental results demonstrate the model’s superior performance on multiple datasets,including ESC-50,UrbanSound8K,RAVDESS,and CREMA-D,with classification accuracies of 93.25%,90.91%,92.36%,and 70.50%,respectively.These results highlight the robustness and effectiveness of the proposed framework,which significantly outperforms existing approaches.By addressing critical challenges such as information loss and limited feature representation,thiswork provides newinsights and methodologies for advancing audio classification and multimodal interaction systems.展开更多
Currently,most trains are equipped with dedicated cameras for capturing pantograph videos.Pantographs are core to the high-speed-railway pantograph-catenary system,and their failure directly affects the normal operati...Currently,most trains are equipped with dedicated cameras for capturing pantograph videos.Pantographs are core to the high-speed-railway pantograph-catenary system,and their failure directly affects the normal operation of high-speed trains.However,given the complex and variable real-world operational conditions of high-speed railways,there is no real-time and robust pantograph fault-detection method capable of handling large volumes of surveillance video.Hence,it is of paramount importance to maintain real-time monitoring and analysis of pantographs.Our study presents a real-time intelligent detection technology for identifying faults in high-speed railway pantographs,utilizing a fusion of self-attention and convolution features.We delved into lightweight multi-scale feature-extraction and fault-detection models based on deep learning to detect pantograph anomalies.Compared with traditional methods,this approach achieves high recall and accuracy in pantograph recognition,accurately pinpointing issues like discharge sparks,pantograph horns,and carbon pantograph-slide malfunctions.After experimentation and validation with actual surveillance videos of electric multiple-unit train,our algorithmic model demonstrates real-time,high-accuracy performance even under complex operational conditions.展开更多
To improve design accuracy and reliability of structures,this study solves the uncertain natural frequencies with consideration for geometric nonlinearity and structural uncertainty.Frequencies of the laminated plate ...To improve design accuracy and reliability of structures,this study solves the uncertain natural frequencies with consideration for geometric nonlinearity and structural uncertainty.Frequencies of the laminated plate with all four edges clamped(CCCC)are derived based on Navier's method and Galerkin's method.The novelty of the current work is that the number of unknowns in the displacement field model of a CCCC plate with free midsurface(CCCC-2 plate)is only three compared with four or five in cases of other exposed methods.The present analytical method is proved to be accurate and reliable by comparing linear natural frequencies and nonlinear natural frequencies with other models available in the open literature.Furthermore,a novel method for analyzing effects of mean values and tolerance zones of uncertain structural parameters on random frequencies is proposed based on a self-developed Multiscale Feature Extraction and Fusion Network(MFEFN)system.Compared with a direct Monte Carlo Simulation(MCS),the MFEFNbased procedure significantly reduces the calculation burden with a guarantee of accuracy.Our research provides a method to calculate nonlinear natural frequencies under two boundary conditions and presentes a surrogate model to predict frequencies for accuracy analysis and optimization design.展开更多
基金supported by the National Natural Science Foundation of China(No.62376287)the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Natural Science Foundation of Hunan Province(Nos.2022JJ30762,2023JJ70016).
文摘Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers 2024TIAD-CYKJCXX0121,2024NSCQ-LZX0135Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology graduate education high-quality development project,grant number gzlsz202401the Chongqing University of Technology-Chongqing LINGLUE Technology Co.,Ltd.,Electronic Information(Artificial Intelligence)graduate joint training basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.,Computer Technology graduate joint training base.
文摘Detecting abnormal cervical cells is crucial for early identification and timely treatment of cervical cancer.However,this task is challenging due to the morphological similarities between abnormal and normal cells and the significant variations in cell size.Pathologists often refer to surrounding cells to identify abnormalities.To emulate this slide examination behavior,this study proposes a Multi-Scale Feature Fusion Network(MSFF-Net)for detecting cervical abnormal cells.MSFF-Net employs a Cross-Scale Pooling Model(CSPM)to effectively capture diverse features and contextual information,ranging from local details to the overall structure.Additionally,a Multi-Scale Fusion Attention(MSFA)module is introduced to mitigate the impact of cell size variations by adaptively fusing local and global information at different scales.To handle the complex environment of cervical cell images,such as cell adhesion and overlapping,the Inner-CIoU loss function is utilized to more precisely measure the overlap between bounding boxes,thereby improving detection accuracy in such scenarios.Experimental results on the Comparison detector dataset demonstrate that MSFF-Net achieves a mean average precision(mAP)of 63.2%,outperforming state-of-the-art methods while maintaining a relatively small number of parameters(26.8 M).This study highlights the effectiveness of multi-scale feature fusion in enhancing the detection of cervical abnormal cells,contributing to more accurate and efficient cervical cancer screening.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported by National Natural Science Foundation of China(62276058,41774063)Fundamental Research Funds for the Central Universities(N25GFZ011).
文摘Ground vibration events represent a critical challenge to operational safety and risk management in mining engineering.Reliable estimation of ground shaking intensity serves as a key prerequisite for effective seismic risk mitigation in mining environments.This study investigates the application of intensity classification techniques to mining scenarios and introduces a deep learning-based Multi-Scale Feature Fusion Classification Network.The model utilizes waveform data within two seconds following P-wave onset to enable rapid and accurate classification of seismic intensity levels.With an intensity threshold of 4.5 used to differentiate low-and high-intensity events,the model achieves a classification accuracy of 90.27%on the test set.Particularly,it surpasses 99%accuracy for samples with intensity levels≤3 or≥6,demonstrating a substantial advantage over traditional approaches that require complete waveform sequences and exhibit limited processing efficiency.Notably,the model maintains over 85%accuracy under-5dB SNR conditions,outperforming state-of-the-art methods in noise resilience,while reducing computational complexity by 78%compared to Transformer-based architectures.The proposed method supports real-time seismic monitoring in mining areas,facilitating rapid assessment of risk levels and providing critical decision support for operational adjustments and emergency response.These capabilities contribute to enhanced seismic resilience and the promotion of safe mining practices.
基金supported by National Natural Science Foundation of China(No.61862037)Lanzhou Jiaotong University Tianyou Innovation Team Project(No.TY202002)。
文摘To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.
基金supported by Communication University of China(HG23035)partly supported by the Fundamental Research Funds for the Central Universities(CUC230A013).
文摘With the rapid growth of socialmedia,the spread of fake news has become a growing problem,misleading the public and causing significant harm.As social media content is often composed of both images and text,the use of multimodal approaches for fake news detection has gained significant attention.To solve the problems existing in previous multi-modal fake news detection algorithms,such as insufficient feature extraction and insufficient use of semantic relations between modes,this paper proposes the MFFFND-Co(Multimodal Feature Fusion Fake News Detection with Co-Attention Block)model.First,the model deeply explores the textual content,image content,and frequency domain features.Then,it employs a Co-Attention mechanism for cross-modal fusion.Additionally,a semantic consistency detectionmodule is designed to quantify semantic deviations,thereby enhancing the performance of fake news detection.Experimentally verified on two commonly used datasets,Twitter and Weibo,the model achieved F1 scores of 90.0% and 94.0%,respectively,significantly outperforming the pre-modified MFFFND(Multimodal Feature Fusion Fake News Detection with Attention Block)model and surpassing other baseline models.This improves the accuracy of detecting fake information in artificial intelligence detection and engineering software detection.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported in part by the National Natural Science Foundation of China under Grants 62463002,62062021 and 62473033in part by the Guiyang Scientific Plan Project[2023]48–11,in part by QKHZYD[2023]010 Guizhou Province Science and Technology Innovation Base Construction Project“Key Laboratory Construction of Intelligent Mountain Agricultural Equipment”.
文摘Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.
基金supported by the National Natural Science Foundation of China(62276092,62303167)the Postdoctoral Fellowship Program(Grade C)of China Postdoctoral Science Foundation(GZC20230707)+3 种基金the Key Science and Technology Program of Henan Province,China(242102211051,242102211042,212102310084)Key Scientiffc Research Projects of Colleges and Universities in Henan Province,China(25A520009)the China Postdoctoral Science Foundation(2024M760808)the Henan Province medical science and technology research plan joint construction project(LHGJ2024069).
文摘Feature fusion is an important technique in medical image classification that can improve diagnostic accuracy by integrating complementary information from multiple sources.Recently,Deep Learning(DL)has been widely used in pulmonary disease diagnosis,such as pneumonia and tuberculosis.However,traditional feature fusion methods often suffer from feature disparity,information loss,redundancy,and increased complexity,hindering the further extension of DL algorithms.To solve this problem,we propose a Graph-Convolution Fusion Network with Self-Supervised Feature Alignment(Self-FAGCFN)to address the limitations of traditional feature fusion methods in deep learning-based medical image classification for respiratory diseases such as pneumonia and tuberculosis.The network integrates Convolutional Neural Networks(CNNs)for robust feature extraction from two-dimensional grid structures and Graph Convolutional Networks(GCNs)within a Graph Neural Network branch to capture features based on graph structure,focusing on significant node representations.Additionally,an Attention-Embedding Ensemble Block is included to capture critical features from GCN outputs.To ensure effective feature alignment between pre-and post-fusion stages,we introduce a feature alignment loss that minimizes disparities.Moreover,to address the limitations of proposed methods,such as inappropriate centroid discrepancies during feature alignment and class imbalance in the dataset,we develop a Feature-Centroid Fusion(FCF)strategy and a Multi-Level Feature-Centroid Update(MLFCU)algorithm,respectively.Extensive experiments on public datasets LungVision and Chest-Xray demonstrate that the Self-FAGCFN model significantly outperforms existing methods in diagnosing pneumonia and tuberculosis,highlighting its potential for practical medical applications.
基金supported by the National Key Research and Development Program of China(2023YFB3307800)National Natural Science Foundation of China(62394343,62373155)+2 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024A26)Fundamental Research Funds for the Central Universities.
文摘Deep Learning has been widely used to model soft sensors in modern industrial processes with nonlinear variables and uncertainty.Due to the outstanding ability for high-level feature extraction,stacked autoencoder(SAE)has been widely used to improve the model accuracy of soft sensors.However,with the increase of network layers,SAE may encounter serious information loss issues,which affect the modeling performance of soft sensors.Besides,there are typically very few labeled samples in the data set,which brings challenges to traditional neural networks to solve.In this paper,a multi-scale feature fused stacked autoencoder(MFF-SAE)is suggested for feature representation related to hierarchical output,where stacked autoencoder,mutual information(MI)and multi-scale feature fusion(MFF)strategies are integrated.Based on correlation analysis between output and input variables,critical hidden variables are extracted from the original variables in each autoencoder's input layer,which are correspondingly given varying weights.Besides,an integration strategy based on multi-scale feature fusion is adopted to mitigate the impact of information loss with the deepening of the network layers.Then,the MFF-SAE method is designed and stacked to form deep networks.Two practical industrial processes are utilized to evaluate the performance of MFF-SAE.Results from simulations indicate that in comparison to other cutting-edge techniques,the proposed method may considerably enhance the accuracy of soft sensor modeling,where the suggested method reduces the root mean square error(RMSE)by 71.8%,17.1%and 64.7%,15.1%,respectively.
基金fully supported by the Frontier Exploration Projects of Longmen Laboratory(No.LMQYTSKT034)Key Research and Development and Promotion of Special(Science and Technology)Project of Henan Province,China(No.252102210158)。
文摘The capacity to diagnose faults in rolling bearings is of significant practical importance to ensure the normal operation of the equipment.Frequency-domain features can effectively enhance the identification of fault modes.However,existing methods often suffer from insufficient frequency-domain representation in practical applications,which greatly affects diagnostic performance.Therefore,this paper proposes a rolling bearing fault diagnosismethod based on aMulti-Scale FusionNetwork(MSFN)using the Time-Division Fourier Transform(TDFT).The method constructs multi-scale channels to extract time-domain and frequency-domain features of the signal in parallel.A multi-level,multi-scale filter-based approach is designed to extract frequency-domain features in a segmented manner.A cross-attention mechanism is introduced to facilitate the fusion of the extracted time-frequency domain features.The performance of the proposed method is validated using the CWRU and Ottawa datasets.The results show that the average accuracy of MSFN under complex noisy signals is 97.75%and 94.41%.The average accuracy under variable load conditions is 98.68%.This demonstrates its significant application potential compared to existing methods.
基金funded by the Research on SOC/SOH Joint Estimation Technology of Electric Vehicle Battery System State Based on Online Parameter Identification Project(2019)the National Natural Science Foundation of China(Grant No.51877120)。
文摘The accurate state of health(SOH)estimation of lithium-ion batteries is crucial for efficient,healthy,and safe operation of battery systems.Extracting meaningful aging information from highly stochastic and noisy data segments while designing SOH estimation algorithms that efficiently handle the large-scale computational demands of cloud-based battery management systems presents a substantial challenge.In this work,we propose a quantum convolutional neural network(QCNN)model designed for accurate,robust,and generalizable SOH estimation with minimal data and parameter requirements and is compatible with quantum computing cloud platforms in the Noisy Intermediate-Scale Quantum.First,we utilize data from 4 datasets comprising 272 cells,covering 5 chemical compositions,4 rated parameters,and 73operating conditions.We design 5 voltage windows as small as 0.3 V for each cell from incremental capacity peaks for stochastic SOH estimation scenarios generation.We extract 3 effective health indicators(HIs)sequences and develop an automated feature fusion method using quantum rotation gate encoding,achieving an R2of 96%.Subsequently,we design a QCNN whose convolutional layer,constructed with variational quantum circuits,comprises merely 39 parameters.Additionally,we explore the impact of training set size,using strategies,and battery materials on the model’s accuracy.Finally,the QCNN with quantum convolutional layers reduces root mean squared error by 28% and achieves an R^(2)exceeding 96% compared to other three commonly used algorithms.This work demonstrates the effectiveness of quantum encoding for automated feature fusion of HIs extracted from limited discharge data.It highlights the potential of QCNN in improving the accuracy,robustness,and generalization of SOH estimation while dealing with stochastic and noisy data with few parameters and simple structure.It also suggests a new paradigm for leveraging quantum computational power in SOH estimation.
基金supported by the National Natural Science Foundation of China(Grant Nos.12272259 and 52005148).
文摘An intelligent diagnosis method based on self-adaptiveWasserstein dual generative adversarial networks and feature fusion is proposed due to problems such as insufficient sample size and incomplete fault feature extraction,which are commonly faced by rolling bearings and lead to low diagnostic accuracy.Initially,dual models of the Wasserstein deep convolutional generative adversarial network incorporating gradient penalty(1D-2DWDCGAN)are constructed to augment the original dataset.A self-adaptive loss threshold control training strategy is introduced,and establishing a self-adaptive balancing mechanism for stable model training.Subsequently,a diagnostic model based on multidimensional feature fusion is designed,wherein complex features from various dimensions are extracted,merging the original signal waveform features,structured features,and time-frequency features into a deep composite feature representation that encompasses multiple dimensions and scales;thus,efficient and accurate small sample fault diagnosis is facilitated.Finally,an experiment between the bearing fault dataset of CaseWestern ReserveUniversity and the fault simulation experimental platformdataset of this research group shows that this method effectively supplements the dataset and remarkably improves the diagnostic accuracy.The diagnostic accuracy after data augmentation reached 99.94%and 99.87%in two different experimental environments,respectively.In addition,robustness analysis is conducted on the diagnostic accuracy of the proposed method under different noise backgrounds,verifying its good generalization performance.
基金This work was funded by the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘The widespread availability of digital multimedia data has led to a new challenge in digital forensics.Traditional source camera identification algorithms usually rely on various traces in the capturing process.However,these traces have become increasingly difficult to extract due to wide availability of various image processing algorithms.Convolutional Neural Networks(CNN)-based algorithms have demonstrated good discriminative capabilities for different brands and even different models of camera devices.However,their performances is not ideal in case of distinguishing between individual devices of the same model,because cameras of the same model typically use the same optical lens,image sensor,and image processing algorithms,that result in minimal overall differences.In this paper,we propose a camera forensics algorithm based on multi-scale feature fusion to address these issues.The proposed algorithm extracts different local features from feature maps of different scales and then fuses them to obtain a comprehensive feature representation.This representation is then fed into a subsequent camera fingerprint classification network.Building upon the Swin-T network,we utilize Transformer Blocks and Graph Convolutional Network(GCN)modules to fuse multi-scale features from different stages of the backbone network.Furthermore,we conduct experiments on established datasets to demonstrate the feasibility and effectiveness of the proposed approach.
基金the Scientific Research Foundation of Liaoning Provincial Department of Education(No.LJKZ0139)the Program for Liaoning Excellent Talents in University(No.LR15045).
文摘In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
文摘Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutional neural network(1DCNN)architectures to enhance ransomware detection capabilities.Addressing common challenges in ransomware detection,particularly dataset class imbalance,the synthetic minority oversampling technique(SMOTE)is employed to generate synthetic samples for minority class,thereby improving detection accuracy.The integration of ViT and 1DCNN through feature fusion enables the model to capture both global contextual and local sequential features,resulting in comprehensive ransomware classification.Tested on the UNSW-NB15 dataset,the proposed ViT-1DCNN model achieved 98%detection accuracy with precision,recall,and F1-score metrics surpassing conventional methods.This approach not only reduces false positives and negatives but also offers scalability and robustness for real-world cybersecurity applications.The results demonstrate the model’s potential as an effective tool for proactive ransomware detection,especially in environments where evolving threats require adaptable and high-accuracy solutions.
基金supported by the National Natural Science Foundation of China(62106214)the Hebei Natural Science Foundation(D2024203008)the Provincial Key Laboratory Performance Subsidy Project(22567612H).
文摘In recent years,audio pattern recognition has emerged as a key area of research,driven by its applications in human-computer interaction,robotics,and healthcare.Traditional methods,which rely heavily on handcrafted features such asMel filters,often suffer frominformation loss and limited feature representation capabilities.To address these limitations,this study proposes an innovative end-to-end audio pattern recognition framework that directly processes raw audio signals,preserving original information and extracting effective classification features.The proposed framework utilizes a dual-branch architecture:a global refinement module that retains channel and temporal details and a multi-scale embedding module that captures high-level semantic information.Additionally,a guided fusion module integrates complementary features from both branches,ensuring a comprehensive representation of audio data.Specifically,the multi-scale audio context embedding module is designed to effectively extract spatiotemporal dependencies,while the global refinement module aggregates multi-scale channel and temporal cues for enhanced modeling.The guided fusion module leverages these features to achieve efficient integration of complementary information,resulting in improved classification accuracy.Experimental results demonstrate the model’s superior performance on multiple datasets,including ESC-50,UrbanSound8K,RAVDESS,and CREMA-D,with classification accuracies of 93.25%,90.91%,92.36%,and 70.50%,respectively.These results highlight the robustness and effectiveness of the proposed framework,which significantly outperforms existing approaches.By addressing critical challenges such as information loss and limited feature representation,thiswork provides newinsights and methodologies for advancing audio classification and multimodal interaction systems.
基金supported by the National Key R&D Program of China(No.2022YFB4301102).
文摘Currently,most trains are equipped with dedicated cameras for capturing pantograph videos.Pantographs are core to the high-speed-railway pantograph-catenary system,and their failure directly affects the normal operation of high-speed trains.However,given the complex and variable real-world operational conditions of high-speed railways,there is no real-time and robust pantograph fault-detection method capable of handling large volumes of surveillance video.Hence,it is of paramount importance to maintain real-time monitoring and analysis of pantographs.Our study presents a real-time intelligent detection technology for identifying faults in high-speed railway pantographs,utilizing a fusion of self-attention and convolution features.We delved into lightweight multi-scale feature-extraction and fault-detection models based on deep learning to detect pantograph anomalies.Compared with traditional methods,this approach achieves high recall and accuracy in pantograph recognition,accurately pinpointing issues like discharge sparks,pantograph horns,and carbon pantograph-slide malfunctions.After experimentation and validation with actual surveillance videos of electric multiple-unit train,our algorithmic model demonstrates real-time,high-accuracy performance even under complex operational conditions.
基金the research project funded by the Fundamental Research Funds for the Central Universities(No.HIT.OCEP.2024038)the National Natural Science Foundation of China(No.52372351)the State Key Laboratory of Micro-Spacecraft Rapid Design and Intelligent Cluster,China(No.MS02240107)。
文摘To improve design accuracy and reliability of structures,this study solves the uncertain natural frequencies with consideration for geometric nonlinearity and structural uncertainty.Frequencies of the laminated plate with all four edges clamped(CCCC)are derived based on Navier's method and Galerkin's method.The novelty of the current work is that the number of unknowns in the displacement field model of a CCCC plate with free midsurface(CCCC-2 plate)is only three compared with four or five in cases of other exposed methods.The present analytical method is proved to be accurate and reliable by comparing linear natural frequencies and nonlinear natural frequencies with other models available in the open literature.Furthermore,a novel method for analyzing effects of mean values and tolerance zones of uncertain structural parameters on random frequencies is proposed based on a self-developed Multiscale Feature Extraction and Fusion Network(MFEFN)system.Compared with a direct Monte Carlo Simulation(MCS),the MFEFNbased procedure significantly reduces the calculation burden with a guarantee of accuracy.Our research provides a method to calculate nonlinear natural frequencies under two boundary conditions and presentes a surrogate model to predict frequencies for accuracy analysis and optimization design.