This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data...This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data structure is established by spatially aligning and stacking SEM images at different magnifications.The MS-SRCNN significantly reduces computational runtime by over 90%compared to traditional architectures like ResNet50,VGG16,and VGG19,without compromising prediction accuracy.The model demonstrates more excellent predictive performance,achieving a>5%increase in R^(2) compared to single-scale models.Furthermore,the MS-SRCNN exhibits robust composition prediction capability across other Mg-based binary alloys,including Mg-La,Mg-Sn,Mg-Ce,Mg-Sm,Mg-Ag,and Mg-Y,thereby emphasizing its generalization and extrapolation potential.This research establishes a non-destructive,microstructure-informed composition analysis framework,reduces characterization time compared to traditional experiment methods and provides insights into the composition-microstructure relationship in diverse material systems.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Faulty-feeder detection in neutral point noneffectively grounded distribution networks consistently attracts research attention since it directly affects quality and safety of energy supply.Most modern research on fau...Faulty-feeder detection in neutral point noneffectively grounded distribution networks consistently attracts research attention since it directly affects quality and safety of energy supply.Most modern research on faulty-feeder detection tends to apply more complex digital signal processing techniques and deeper neural networks in order to better extract and learn as many detailed characteristics as possible.However,these approaches may easily result in overfitting and high computational cost,which cannot meet requirements for detection accuracy and efficiency in practical applications.This paper proposes an innovative waveform encoding method and details a simple convolutional neural network(CNN)with one layer of convolution used for identification,which seeks to improve detection accuracy and efficiency simultaneously.First,sparse characteristics of waveforms are utilized to encode into compact vectors,and a waveform-vector matrix is generated.Second,to deduce waveform-vector matrix,a simple CNN with multi-scale filters and one layer of convolution is established.Finally,a methodology for faulty-feeder detection is proposed,and both detection accuracy and efficiency are considerably enhanced.Comparative studies have confirmed clear superiority of the developed method,which outperforms existing approaches in both detection accuracy and efficiency,thus highlighting its significant potential for application.展开更多
Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
In the burgeoning field of anomaly detection within attributed networks,traditional methodologies often encounter the intricacies of network complexity,particularly in capturing nonlinearity and sparsity.This study in...In the burgeoning field of anomaly detection within attributed networks,traditional methodologies often encounter the intricacies of network complexity,particularly in capturing nonlinearity and sparsity.This study introduces an innovative approach that synergizes the strengths of graph convolutional networks with advanced deep residual learning and a unique residual-based attention mechanism,thereby creating a more nuanced and efficient method for anomaly detection in complex networks.The heart of our model lies in the integration of graph convolutional networks that capture complex structural relationships within the network data.This is further bolstered by deep residual learning,which is employed to model intricate nonlinear connections directly from input data.A pivotal innovation in our approach is the incorporation of a residual-based attention mech-anism.This mechanism dynamically adjusts the importance of nodes based on their residual information,thereby significantly enhancing the sensitivity of the model to subtle anomalies.Furthermore,we introduce a novel hypersphere mapping technique in the latent space to distinctly separate normal and anomalous data.This mapping is the key to our model’s ability to pinpoint anomalies with greater precision.An extensive experimental setup was used to validate the efficacy of the proposed model.Using attributed social network datasets,we demonstrate that our model not only competes with but also surpasses existing state-of-the-art methods in anomaly detection.The results show the exceptional capability of our model to handle the multifaceted nature of real-world networks.展开更多
Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attentio...Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge percep...Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge perception,especially under complex dynamic blur.To address these challenges,we propose the Multi-Resolution Fusion Network(MRFNet),a blind multi-scale deblurring framework that integrates progressive residual connectivity for hierarchical feature fusion.The network employs a three-stage design:(1)TransformerBlocks capture long-range dependencies and reconstruct coarse global structures;(2)Nonlinear Activation Free Blocks(NAFBlocks)enhance local detail representation and mid-level feature fusion;and(3)an optimized residual subnetwork based on gated feature modulation refines texture and edge details for high-fidelity restoration.Extensive experiments demonstrate that MRFNet achieves superior performance compared to state-of-the-art methods.On GoPro,it attains 32.52 dB Peak Signal-to-Noise Ratio(PSNR)and 0.071 Learned Perceptual Image Patch Similarity(LPIPS),outperforming MIMOWNet(32.50 dB,0.075).On HIDE,it achieves 30.25 dB PSNR and 0.945 Structural Similarity Index Measure(SSIM),representing gains of+0.26 dB and+0.015 SSIM over MIMO-UNet(29.99 dB,0.930).On RealBlur-J,it reaches 28.82 dB PSNR and 0.872 SSIM,surpassing MIMO-UNet by+1.19 dB and+0.035 SSIM(27.63 dB,0.837).These results validate the effectiveness of the proposed progressive residual fusion and hybrid attention mechanisms in balancing global context understanding and local detail recovery for blind image deblurring.展开更多
Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract loc...Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.展开更多
Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propos...Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.展开更多
The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods ha...The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods have achieved significant improvements in image super-resolution(SR),current CNNbased techniques mainly contain massive parameters and a high computational complexity,limiting their practical applications.In this paper,we present a fast and lightweight framework,named weighted multi-scale residual network(WMRN),for a better tradeoff between SR performance and computational efficiency.With the modified residual structure,depthwise separable convolutions(DS Convs)are employed to improve convolutional operations’efficiency.Furthermore,several weighted multi-scale residual blocks(WMRBs)are stacked to enhance the multi-scale representation capability.In the reconstruction subnetwork,a group of Conv layers are introduced to filter feature maps to reconstruct the final high-quality image.Extensive experiments were conducted to evaluate the proposed model,and the comparative results with several state-of-the-art algorithms demonstrate the effectiveness of WMRN.展开更多
As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symboli...As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.展开更多
Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cann...Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Amodel that can obtain rapid and accurate detection of coronavirus disease 2019(COVID-19)plays a significant role in treating and preventing the spread of disease transmission.However,designing such amodel that can ba...Amodel that can obtain rapid and accurate detection of coronavirus disease 2019(COVID-19)plays a significant role in treating and preventing the spread of disease transmission.However,designing such amodel that can balance the detection accuracy andweight parameters ofmemorywell to deploy a mobile device is challenging.Taking this point into account,this paper fuses the convolutional neural network and residual learning operations to build a multi-class classification model,which improves COVID-19 pneumonia detection performance and keeps a trade-off between the weight parameters and accuracy.The convolutional neural network can extract the COVID-19 feature information by repeated convolutional operations.The residual learning operations alleviate the gradient problems caused by stacking convolutional layers and enhance the ability of feature extraction.The ability further enables the proposed model to acquire effective feature information at a lowcost,which canmake ourmodel keep smallweight parameters.Extensive validation and comparison with other models of COVID-19 pneumonia detection on the well-known COVIDx dataset show that(1)the sensitivity of COVID-19 pneumonia detection is improved from 88.2%(non-COVID-19)and 77.5%(COVID-19)to 95.3%(non-COVID-19)and 96.5%(COVID-19),respectively.The positive predictive value is also respectively increased from72.8%(non-COVID-19)and 89.0%(COVID-19)to 88.8%(non-COVID-19)and 95.1%(COVID-19).(2)Compared with the weight parameters of the COVIDNet-small network,the value of the proposed model is 13 M,which is slightly higher than that(11.37 M)of the COVIDNet-small network.But,the corresponding accuracy is improved from 85.2%to 93.0%.The above results illustrate the proposed model can gain an efficient balance between accuracy and weight parameters.展开更多
Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label c...Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.展开更多
Rockburst is a phenomenon in which free surfaces are formed during excavation,which subsequently causes the sudden release of energy in the construction of mines and tunnels.Light rockburst only peels off rock slices ...Rockburst is a phenomenon in which free surfaces are formed during excavation,which subsequently causes the sudden release of energy in the construction of mines and tunnels.Light rockburst only peels off rock slices without ejection,while severe rockburst causes casualties and property loss.The frequency and degree of rockburst damage increases with the excavation depth.Moreover,rockburst is the leading engineering geological hazard in the excavation process,and thus the prediction of its intensity grade is of great significance to the development of geotechnical engineering.Therefore,the prediction of rockburst intensity grade is one problem that needs to be solved urgently.By comprehensively considering the occurrence mechanism of rockburst,this paper selects the stress index(σθ/σc),brittleness index(σ_(c)/σ_(t)),and rock elastic energy index(Wet)as the rockburst evaluation indexes through the Spearman coefficient method.This overcomes the low accuracy problem of a single evaluation index prediction method.Following this,the BGD-MSR-DNN rockburst intensity grade prediction model based on batch gradient descent and a multi-scale residual deep neural network is proposed.The batch gradient descent(BGD)module is used to replace the gradient descent algorithm,which effectively improves the efficiency of the network and reduces the model training time.Moreover,the multi-scale residual(MSR)module solves the problem of network degradation when there are too many hidden layers of the deep neural network(DNN),thus improving the model prediction accuracy.The experimental results reveal the BGDMSR-DNN model accuracy to reach 97.1%,outperforming other comparable models.Finally,actual projects such as Qinling Tunnel and Daxiangling Tunnel,reached an accuracy of 100%.The model can be applied in mines and tunnel engineering to realize the accurate and rapid prediction of rockburst intensity grade.展开更多
Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detect...Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detection positions for known or unknown objects by a modular robotic system,a convolutional neural network(CNN)with the residual block is proposed,which can be used to generate accurate grasping detection for input images of the scene.The proposed model architecture was trained on the standard Cornell grasp dataset and evaluated on the test dataset.Moreover,it was evaluated on different types of household objects and cluttered multi-objects.On the Cornell grasp dataset,the accuracy of the model on image-wise splitting detection and object-wise splitting detection achieved 95.5%and 93.6%,respectively.Further,the real detection time per image was 109 ms.The experimental results show that the model can quickly detect the grasping positions of a single object or multiple objects in image pixels in real time,and it keeps good stability and robustness.展开更多
The detection of ash content in coal slime flotation tailings using deep learning can be hindered by various factors such as foam,impurities,and changing lighting conditions that disrupt the collection of tailings ima...The detection of ash content in coal slime flotation tailings using deep learning can be hindered by various factors such as foam,impurities,and changing lighting conditions that disrupt the collection of tailings images.To address this challenge,we present a method for ash content detection in coal slime flotation tailings.This method utilizes chromatographic filter paper sampling and a multi-scale residual network,which we refer to as MRCN.Initially,tailings are sampled using chromatographic filter paper to obtain static tailings images,effectively isolating interference factors at the flotation site.Subsequently,the MRCN,consisting of a multi-scale residual network,is employed to extract image features and compute ash content.Within the MRCN structure,tailings images undergo convolution operations through two parallel branches that utilize convolution kernels of different sizes,enabling the extraction of image features at various scales and capturing a more comprehensive representation of the ash content information.Furthermore,a channel attention mechanism is integrated to enhance the performance of the model.The combination of the multi-scale residual structure and the channel attention mechanism within MRCN results in robust capabilities for image feature extraction and ash content detection.Comparative experiments demonstrate that this proposed approach,based on chromatographic filter paper sampling and the multi-scale residual network,exhibits significantly superior performance in the detection of ash content in coal slime flotation tailings.展开更多
In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and rec...In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.展开更多
基金funded by the National Natural Science Foundation of China(No.52204407)the Natural Science Foundation of Jiangsu Province(No.BK20220595)the China Postdoctoral Science Foundation(No.2022M723689).
文摘This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data structure is established by spatially aligning and stacking SEM images at different magnifications.The MS-SRCNN significantly reduces computational runtime by over 90%compared to traditional architectures like ResNet50,VGG16,and VGG19,without compromising prediction accuracy.The model demonstrates more excellent predictive performance,achieving a>5%increase in R^(2) compared to single-scale models.Furthermore,the MS-SRCNN exhibits robust composition prediction capability across other Mg-based binary alloys,including Mg-La,Mg-Sn,Mg-Ce,Mg-Sm,Mg-Ag,and Mg-Y,thereby emphasizing its generalization and extrapolation potential.This research establishes a non-destructive,microstructure-informed composition analysis framework,reduces characterization time compared to traditional experiment methods and provides insights into the composition-microstructure relationship in diverse material systems.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
文摘Faulty-feeder detection in neutral point noneffectively grounded distribution networks consistently attracts research attention since it directly affects quality and safety of energy supply.Most modern research on faulty-feeder detection tends to apply more complex digital signal processing techniques and deeper neural networks in order to better extract and learn as many detailed characteristics as possible.However,these approaches may easily result in overfitting and high computational cost,which cannot meet requirements for detection accuracy and efficiency in practical applications.This paper proposes an innovative waveform encoding method and details a simple convolutional neural network(CNN)with one layer of convolution used for identification,which seeks to improve detection accuracy and efficiency simultaneously.First,sparse characteristics of waveforms are utilized to encode into compact vectors,and a waveform-vector matrix is generated.Second,to deduce waveform-vector matrix,a simple CNN with multi-scale filters and one layer of convolution is established.Finally,a methodology for faulty-feeder detection is proposed,and both detection accuracy and efficiency are considerably enhanced.Comparative studies have confirmed clear superiority of the developed method,which outperforms existing approaches in both detection accuracy and efficiency,thus highlighting its significant potential for application.
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
文摘In the burgeoning field of anomaly detection within attributed networks,traditional methodologies often encounter the intricacies of network complexity,particularly in capturing nonlinearity and sparsity.This study introduces an innovative approach that synergizes the strengths of graph convolutional networks with advanced deep residual learning and a unique residual-based attention mechanism,thereby creating a more nuanced and efficient method for anomaly detection in complex networks.The heart of our model lies in the integration of graph convolutional networks that capture complex structural relationships within the network data.This is further bolstered by deep residual learning,which is employed to model intricate nonlinear connections directly from input data.A pivotal innovation in our approach is the incorporation of a residual-based attention mech-anism.This mechanism dynamically adjusts the importance of nodes based on their residual information,thereby significantly enhancing the sensitivity of the model to subtle anomalies.Furthermore,we introduce a novel hypersphere mapping technique in the latent space to distinctly separate normal and anomalous data.This mapping is the key to our model’s ability to pinpoint anomalies with greater precision.An extensive experimental setup was used to validate the efficacy of the proposed model.Using attributed social network datasets,we demonstrate that our model not only competes with but also surpasses existing state-of-the-art methods in anomaly detection.The results show the exceptional capability of our model to handle the multifaceted nature of real-world networks.
基金supported by the Intelligent System Research Group(ISysRG)supported by Universitas Sriwijaya funded by the Competitive Research 2024.
文摘Handling missing data accurately is critical in clinical research, where data quality directly impacts decision-making and patient outcomes. While deep learning (DL) techniques for data imputation have gained attention, challenges remain, especially when dealing with diverse data types. In this study, we introduce a novel data imputation method based on a modified convolutional neural network, specifically, a Deep Residual-Convolutional Neural Network (DRes-CNN) architecture designed to handle missing values across various datasets. Our approach demonstrates substantial improvements over existing imputation techniques by leveraging residual connections and optimized convolutional layers to capture complex data patterns. We evaluated the model on publicly available datasets, including Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV), which contain critical care patient data, and the Beijing Multi-Site Air Quality dataset, which measures environmental air quality. The proposed DRes-CNN method achieved a root mean square error (RMSE) of 0.00006, highlighting its high accuracy and robustness. We also compared with Low Light-Convolutional Neural Network (LL-CNN) and U-Net methods, which had RMSE values of 0.00075 and 0.00073, respectively. This represented an improvement of approximately 92% over LL-CNN and 91% over U-Net. The results showed that this DRes-CNN-based imputation method outperforms current state-of-the-art models. These results established DRes-CNN as a reliable solution for addressing missing data.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金funded by Qinghai University Postgraduate Research and Practice Innovation Program of Funder,grant number 2025-GMKY-42.
文摘Recent advances in deep learning have significantly improved image deblurring;however,existing approaches still suffer from limited global context modeling,inadequate detail restoration,and poor texture or edge perception,especially under complex dynamic blur.To address these challenges,we propose the Multi-Resolution Fusion Network(MRFNet),a blind multi-scale deblurring framework that integrates progressive residual connectivity for hierarchical feature fusion.The network employs a three-stage design:(1)TransformerBlocks capture long-range dependencies and reconstruct coarse global structures;(2)Nonlinear Activation Free Blocks(NAFBlocks)enhance local detail representation and mid-level feature fusion;and(3)an optimized residual subnetwork based on gated feature modulation refines texture and edge details for high-fidelity restoration.Extensive experiments demonstrate that MRFNet achieves superior performance compared to state-of-the-art methods.On GoPro,it attains 32.52 dB Peak Signal-to-Noise Ratio(PSNR)and 0.071 Learned Perceptual Image Patch Similarity(LPIPS),outperforming MIMOWNet(32.50 dB,0.075).On HIDE,it achieves 30.25 dB PSNR and 0.945 Structural Similarity Index Measure(SSIM),representing gains of+0.26 dB and+0.015 SSIM over MIMO-UNet(29.99 dB,0.930).On RealBlur-J,it reaches 28.82 dB PSNR and 0.872 SSIM,surpassing MIMO-UNet by+1.19 dB and+0.035 SSIM(27.63 dB,0.837).These results validate the effectiveness of the proposed progressive residual fusion and hybrid attention mechanisms in balancing global context understanding and local detail recovery for blind image deblurring.
基金supported by the Xiamen Science and Technology Subsidy Project(No.2023CXY0318).
文摘Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.
基金supported by the National Natural Science Foundation of China[grant number 41671452].
文摘Although the Convolutional Neural Network(CNN)has shown great potential for land cover classification,the frequently used single-scale convolution kernel limits the scope of informa-tion extraction.Therefore,we propose a Multi-Scale Fully Convolutional Network(MSFCN)with a multi-scale convolutional kernel as well as a Channel Attention Block(CAB)and a Global Pooling Module(GPM)in this paper to exploit discriminative representations from two-dimensional(2D)satellite images.Meanwhile,to explore the ability of the proposed MSFCN for spatio-temporal images,we expand our MSFCN to three-dimension using three-dimensional(3D)CNN,capable of harnessing each land cover category’s time series interac-tion from the reshaped spatio-temporal remote sensing images.To verify the effectiveness of the proposed MSFCN,we conduct experiments on two spatial datasets and two spatio-temporal datasets.The proposed MSFCN achieves 60.366%on the WHDLD dataset and 75.127%on the GID dataset in terms of mIoU index while the figures for two spatio-temporal datasets are 87.753%and 77.156%.Extensive comparative experiments and abla-tion studies demonstrate the effectiveness of the proposed MSFCN.
基金the National Natural Science Foundation of China(61772149,61866009,61762028,U1701267,61702169)Guangxi Science and Technology Project(2019GXNSFFA245014,ZY20198016,AD18281079,AD18216004)+1 种基金the Natural Science Foundation of Hunan Province(2020JJ3014)Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics(GIIP202001).
文摘The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods have achieved significant improvements in image super-resolution(SR),current CNNbased techniques mainly contain massive parameters and a high computational complexity,limiting their practical applications.In this paper,we present a fast and lightweight framework,named weighted multi-scale residual network(WMRN),for a better tradeoff between SR performance and computational efficiency.With the modified residual structure,depthwise separable convolutions(DS Convs)are employed to improve convolutional operations’efficiency.Furthermore,several weighted multi-scale residual blocks(WMRBs)are stacked to enhance the multi-scale representation capability.In the reconstruction subnetwork,a group of Conv layers are introduced to filter feature maps to reconstruct the final high-quality image.Extensive experiments were conducted to evaluate the proposed model,and the comparative results with several state-of-the-art algorithms demonstrate the effectiveness of WMRN.
基金Supported in part by Natural Science Foundation of China(Grant Nos.51835009,51705398)Shaanxi Province 2020 Natural Science Basic Research Plan(Grant No.2020JQ-042)Aeronautical Science Foundation(Grant No.2019ZB070001).
文摘As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.
基金supported by Nature Science Research Project of Higher Education Institutions in Jiangsu Province under Grant No.21KJB510018National Nature Science Foundation of China (NSFC)under Grant No.62001215.
文摘Microphone array-based sound source localization(SSL)is widely used in a variety of occasions such as video conferencing,robotic hearing,speech enhancement,speech recognition and so on.The traditional SSL methods cannot achieve satisfactory performance in adverse noisy and reverberant environments.In order to improve localization performance,a novel SSL algorithm using convolutional residual network(CRN)is proposed in this paper.The spatial features including time difference of arrivals(TDOAs)between microphone pairs and steered response power-phase transform(SRPPHAT)spatial spectrum are extracted in each Gammatone sub-band.The spatial features of different sub-bands with a frame are combine into a feature matrix as the input of CRN.The proposed algorithm employ CRN to fuse the spatial features.Since the CRN introduces the residual structure on the basis of the convolutional network,it reduce the difficulty of training procedure and accelerate the convergence of the model.A CRN model is learned from the training data in various reverberation and noise environments to establish the mapping regularity between the input feature and the sound azimuth.Through simulation verification,compared with the methods using traditional deep neural network,the proposed algorithm can achieve a better localization performance in SSL task,and provide better generalization capacity to untrained noise and reverberation.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金This work was supported in part by the science and technology research project of Henan Provincial Department of science and technology(No.222102110366)the Science and Technology Innovation Team of Henan University(No.22IRTSTHN016)the grants from the teaching reform research and practice project of higher education in Henan Province in 2021[2021SJGLX502].
文摘Amodel that can obtain rapid and accurate detection of coronavirus disease 2019(COVID-19)plays a significant role in treating and preventing the spread of disease transmission.However,designing such amodel that can balance the detection accuracy andweight parameters ofmemorywell to deploy a mobile device is challenging.Taking this point into account,this paper fuses the convolutional neural network and residual learning operations to build a multi-class classification model,which improves COVID-19 pneumonia detection performance and keeps a trade-off between the weight parameters and accuracy.The convolutional neural network can extract the COVID-19 feature information by repeated convolutional operations.The residual learning operations alleviate the gradient problems caused by stacking convolutional layers and enhance the ability of feature extraction.The ability further enables the proposed model to acquire effective feature information at a lowcost,which canmake ourmodel keep smallweight parameters.Extensive validation and comparison with other models of COVID-19 pneumonia detection on the well-known COVIDx dataset show that(1)the sensitivity of COVID-19 pneumonia detection is improved from 88.2%(non-COVID-19)and 77.5%(COVID-19)to 95.3%(non-COVID-19)and 96.5%(COVID-19),respectively.The positive predictive value is also respectively increased from72.8%(non-COVID-19)and 89.0%(COVID-19)to 88.8%(non-COVID-19)and 95.1%(COVID-19).(2)Compared with the weight parameters of the COVIDNet-small network,the value of the proposed model is 13 M,which is slightly higher than that(11.37 M)of the COVIDNet-small network.But,the corresponding accuracy is improved from 85.2%to 93.0%.The above results illustrate the proposed model can gain an efficient balance between accuracy and weight parameters.
基金Supported by the National Natural Science Foundation of China(No.61602191,61672521,61375037,61473291,61572501,61572536,61502491,61372107,61401167)the Natural Science Foundation of Fujian Province(No.2016J01308)+3 种基金the Scientific and Technology Funds of Quanzhou(No.2015Z114)the Scientific and Technology Funds of Xiamen(No.3502Z20173045)the Promotion Program for Young and Middle aged Teacher in Science and Technology Research of Huaqiao University(No.ZQN-PY418,ZQN-YX403)the Scientific Research Funds of Huaqiao University(No.16BS108)
文摘Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.
基金funded by State Key Laboratory for GeoMechanics and Deep Underground Engineering&Institute for Deep Underground Science and Engineering,Grant Number XD2021021BUCEA Post Graduate Innovation Project under Grant,Grant Number PG2023092.
文摘Rockburst is a phenomenon in which free surfaces are formed during excavation,which subsequently causes the sudden release of energy in the construction of mines and tunnels.Light rockburst only peels off rock slices without ejection,while severe rockburst causes casualties and property loss.The frequency and degree of rockburst damage increases with the excavation depth.Moreover,rockburst is the leading engineering geological hazard in the excavation process,and thus the prediction of its intensity grade is of great significance to the development of geotechnical engineering.Therefore,the prediction of rockburst intensity grade is one problem that needs to be solved urgently.By comprehensively considering the occurrence mechanism of rockburst,this paper selects the stress index(σθ/σc),brittleness index(σ_(c)/σ_(t)),and rock elastic energy index(Wet)as the rockburst evaluation indexes through the Spearman coefficient method.This overcomes the low accuracy problem of a single evaluation index prediction method.Following this,the BGD-MSR-DNN rockburst intensity grade prediction model based on batch gradient descent and a multi-scale residual deep neural network is proposed.The batch gradient descent(BGD)module is used to replace the gradient descent algorithm,which effectively improves the efficiency of the network and reduces the model training time.Moreover,the multi-scale residual(MSR)module solves the problem of network degradation when there are too many hidden layers of the deep neural network(DNN),thus improving the model prediction accuracy.The experimental results reveal the BGDMSR-DNN model accuracy to reach 97.1%,outperforming other comparable models.Finally,actual projects such as Qinling Tunnel and Daxiangling Tunnel,reached an accuracy of 100%.The model can be applied in mines and tunnel engineering to realize the accurate and rapid prediction of rockburst intensity grade.
基金National Natural Science Foundation of China(No.52101346)Fundamental Research Funds for the Central Universities,China(No.2232019D3-61)Initial Research Fund for the Young Teachers of Donghua University,China。
文摘Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detection positions for known or unknown objects by a modular robotic system,a convolutional neural network(CNN)with the residual block is proposed,which can be used to generate accurate grasping detection for input images of the scene.The proposed model architecture was trained on the standard Cornell grasp dataset and evaluated on the test dataset.Moreover,it was evaluated on different types of household objects and cluttered multi-objects.On the Cornell grasp dataset,the accuracy of the model on image-wise splitting detection and object-wise splitting detection achieved 95.5%and 93.6%,respectively.Further,the real detection time per image was 109 ms.The experimental results show that the model can quickly detect the grasping positions of a single object or multiple objects in image pixels in real time,and it keeps good stability and robustness.
基金This work was supported by National Natural Science Foundation of China:Grant No.62106048.
文摘The detection of ash content in coal slime flotation tailings using deep learning can be hindered by various factors such as foam,impurities,and changing lighting conditions that disrupt the collection of tailings images.To address this challenge,we present a method for ash content detection in coal slime flotation tailings.This method utilizes chromatographic filter paper sampling and a multi-scale residual network,which we refer to as MRCN.Initially,tailings are sampled using chromatographic filter paper to obtain static tailings images,effectively isolating interference factors at the flotation site.Subsequently,the MRCN,consisting of a multi-scale residual network,is employed to extract image features and compute ash content.Within the MRCN structure,tailings images undergo convolution operations through two parallel branches that utilize convolution kernels of different sizes,enabling the extraction of image features at various scales and capturing a more comprehensive representation of the ash content information.Furthermore,a channel attention mechanism is integrated to enhance the performance of the model.The combination of the multi-scale residual structure and the channel attention mechanism within MRCN results in robust capabilities for image feature extraction and ash content detection.Comparative experiments demonstrate that this proposed approach,based on chromatographic filter paper sampling and the multi-scale residual network,exhibits significantly superior performance in the detection of ash content in coal slime flotation tailings.
文摘In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.