The distinctive fault characteristics of battery energy storage stations(BESSs)significantly affect the reliability of conventional protection methods for transmission lines.In this paper,the three-dimensional(3D)data...The distinctive fault characteristics of battery energy storage stations(BESSs)significantly affect the reliability of conventional protection methods for transmission lines.In this paper,the three-dimensional(3D)data scattergrams are constructed using current data from both sides of the transmission line and their sum.Following a comprehensive analysis of the varying characteristics of 3D data scattergrams under different conditions,a 3D data scattergram image classification based protection method is developed.The depth-wise separable convolution is used to ensure a lightweight convolutional neural network(CNN)structure without compromising performance.In addition,a Bayesian hyperparameter optimization algorithm is used to achieve a hyperparametric search to simplify the training process.Compared with artificial neural networks and CNNs,the depth-wise separable convolution based CNN(DPCNN)achieves a higher recognition accuracy.The 3D data scattergram image classification based protection method using DPCNN can accurately separate internal faults from other disturbances and identify fault phases under different operating states and fault conditions.The proposed protection method also shows first-class tolerability against current transformer(CT)saturation and CT measurement errors.展开更多
Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained promine...Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.展开更多
One of the most obvious clinical reasons of dementia or The Behavioral and Psychological Symptoms of Dementia(BPSD)are the lack of emotional expression,the increased frequency of negative emotions,and the impermanence...One of the most obvious clinical reasons of dementia or The Behavioral and Psychological Symptoms of Dementia(BPSD)are the lack of emotional expression,the increased frequency of negative emotions,and the impermanence of emotions.Observing the reduction of BPSD in dementia through emotions can be considered effective and widely used in the field of non-pharmacological therapy.At present,this article will verify whether the image recognition artificial intelligence(AI)system can correctly reflect the emotional performance of the elderly with dementia through a questionnaire survey of three professional elderly nursing staff.The ANOVA(sig.=0.50)is used to determine that the judgment given by the nursing staff has no obvious deviation,and then Kendall's test(0.722**)and spearman's test(0.863**)are used to verify the judgment severity of the emotion recognition system and the nursing staff unanimously.This implies the usability of the tool.Additionally,it can be expected to be further applied in the research related to BPSD elderly emotion detection.展开更多
Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in dee...Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in deep learning for fire detection.Firstly,the depthwise separable convolution is used to classify fire images,which saves a lot of detection time under the premise of ensuring detection accuracy.Secondly,You Only Look Once version 3(YOLOv3)target regression function is used to output the fire position information for the images whose classification result is fire,which avoids the problem that the accuracy of detection cannot be guaranteed by using YOLOv3 for target classification and position regression.At the same time,the detection time of target regression for images without fire is greatly reduced saved.The experiments were tested using a network public database.The detection accuracy reached 98%and the detection rate reached 38fps.This method not only saves the workload of manually extracting flame characteristics,reduces the calculation cost,and reduces the amount of parameters,but also improves the detection accuracy and detection rate.展开更多
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi...Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.展开更多
In order to prevent possible casualties and economic loss, it is critical to accurate prediction of the Remaining Useful Life (RUL) in rail prognostics health management. However, the traditional neural networks is di...In order to prevent possible casualties and economic loss, it is critical to accurate prediction of the Remaining Useful Life (RUL) in rail prognostics health management. However, the traditional neural networks is difficult to capture the long-term dependency relationship of the time series in the modeling of the long time series of rail damage, due to the coupling relationship of multi-channel data from multiple sensors. Here, in this paper, a novel RUL prediction model with an enhanced pulse separable convolution is used to solve this issue. Firstly, a coding module based on the improved pulse separable convolutional network is established to effectively model the relationship between the data. To enhance the network, an alternate gradient back propagation method is implemented. And an efficient channel attention (ECA) mechanism is developed for better emphasizing the useful pulse characteristics. Secondly, an optimized Transformer encoder was designed to serve as the backbone of the model. It has the ability to efficiently understand relationship between the data itself and each other at each time step of long time series with a full life cycle. More importantly, the Transformer encoder is improved by integrating pulse maximum pooling to retain more pulse timing characteristics. Finally, based on the characteristics of the front layer, the final predicted RUL value was provided and served as the end-to-end solution. The empirical findings validate the efficacy of the suggested approach in forecasting the rail RUL, surpassing various existing data-driven prognostication techniques. Meanwhile, the proposed method also shows good generalization performance on PHM2012 bearing data set.展开更多
In the coal mining industry,the gangue separation phase imposes a key challenge due to the high visual similaritybetween coal and gangue.Recently,separation methods have become more intelligent and efficient,using new...In the coal mining industry,the gangue separation phase imposes a key challenge due to the high visual similaritybetween coal and gangue.Recently,separation methods have become more intelligent and efficient,using newtechnologies and applying different features for recognition.One such method exploits the difference in substancedensity,leading to excellent coal/gangue recognition.Therefore,this study uses density differences to distinguishcoal from gangue by performing volume prediction on the samples.Our training samples maintain a record of3-side images as input,volume,and weight as the ground truth for the classification.The prediction process relieson a Convolutional neural network(CGVP-CNN)model that receives an input of a 3-side image and then extractsthe needed features to estimate an approximation for the volume.The classification was comparatively performedvia ten different classifiers,namely,K-Nearest Neighbors(KNN),Linear Support Vector Machines(Linear SVM),Radial Basis Function(RBF)SVM,Gaussian Process,Decision Tree,Random Forest,Multi-Layer Perceptron(MLP),Adaptive Boosting(AdaBosst),Naive Bayes,and Quadratic Discriminant Analysis(QDA).After severalexperiments on testing and training data,results yield a classification accuracy of 100%,92%,95%,96%,100%,100%,100%,96%,81%,and 92%,respectively.The test reveals the best timing with KNN,which maintained anaccuracy level of 100%.Assessing themodel generalization capability to newdata is essential to ensure the efficiencyof the model,so by applying a cross-validation experiment,the model generalization was measured.The useddataset was isolated based on the volume values to ensure the model generalization not only on new images of thesame volume but with a volume outside the trained range.Then,the predicted volume values were passed to theclassifiers group,where classification reported accuracy was found to be(100%,100%,100%,98%,88%,87%,100%,87%,97%,100%),respectively.Although obtaining a classification with high accuracy is the main motive,this workhas a remarkable reduction in the data preprocessing time compared to related works.The CGVP-CNN modelmanaged to reduce the data preprocessing time of previous works to 0.017 s while maintaining high classificationaccuracy using the estimated volume value.展开更多
The accurate and automatic segmentation of retinal vessels fromfundus images is critical for the early diagnosis and prevention ofmany eye diseases,such as diabetic retinopathy(DR).Existing retinal vessel segmentation...The accurate and automatic segmentation of retinal vessels fromfundus images is critical for the early diagnosis and prevention ofmany eye diseases,such as diabetic retinopathy(DR).Existing retinal vessel segmentation approaches based on convolutional neural networks(CNNs)have achieved remarkable effectiveness.Here,we extend a retinal vessel segmentation model with low complexity and high performance based on U-Net,which is one of the most popular architectures.In view of the excellent work of depth-wise separable convolution,we introduce it to replace the standard convolutional layer.The complexity of the proposed model is reduced by decreasing the number of parameters and calculations required for themodel.To ensure performance while lowering redundant parameters,we integrate the pre-trained MobileNet V2 into the encoder.Then,a feature fusion residual module(FFRM)is designed to facilitate complementary strengths by enhancing the effective fusion between adjacent levels,which alleviates extraneous clutter introduced by direct fusion.Finally,we provide detailed comparisons between the proposed SepFE and U-Net in three retinal image mainstream datasets(DRIVE,STARE,and CHASEDB1).The results show that the number of SepFE parameters is only 3%of U-Net,the Flops are only 8%of U-Net,and better segmentation performance is obtained.The superiority of SepFE is further demonstrated through comparisons with other advanced methods.展开更多
Introduction:Accurate prediction of protocadherin 8(PCDH8)gene expression status from whole-slide images(WSIs)is critical for thyroid cancer diagnosis and prognosis,as PCDH8 overexpression is associated with tumor agg...Introduction:Accurate prediction of protocadherin 8(PCDH8)gene expression status from whole-slide images(WSIs)is critical for thyroid cancer diagnosis and prognosis,as PCDH8 overexpression is associated with tumor aggressiveness and poor outcomes.Existing methods for PCDH8 detection are often costly,time-consuming,or require specialized expertise.To address these limitations,we developed a novel depth-wise separable residual neural network(DSRNet)for noninvasive PCDH8 status prediction directly from WSIs.Materials and methods:We collected 403 thyroid cancer WSIs from The Cancer Genome Atlas(TCGA),with PCDH8 expression status classified as high or low based on median expression values.Each WSI was divided into 512×512 pixel tiles,with the top 100 non-white tiles selected per slide.DSRNet integrates depth-wise separable convolutions,residual connections,and a deformable convolutional pyramid pooling module to efficiently capture multiscale and long-range features in gigapixel WSIs.The model was trained using tenfold cross-validation.Results:DSRNet achieved state-of-the-art performance with 92.76%accuracy,91.92%precision,92.69%recall,and 0.93 area under the curve on the thyroid cancer dataset(TCGA-THCA),significantly outperforming leading convolutional neural networks and Transformer models.Ablation studies confirmed the contributions of each component,and attention visualization showed that DSRNet focuses on biologically relevant regions.The model also generalized well to a breast cancer dataset(TCGA-BRCA),achieving 89.13%accuracy.Conclusions:We developed DSRNet,a deep learning-based model for predicting PCDH8 status directly from routine hematoxylin and eosin-stained pathological images.DSRNet combines the efficiency of convolutional operations with enhanced long-range dependency modeling,providing a noninvasive,accurate,and interpretable tool for auxiliary thyroid cancer diagnosis and prognosis.The results demonstrate its strong potential for clinical translation,though further multicenter validation is warranted.展开更多
This letter deals with the frequency domain Blind Source Separation of Convolutive Mixtures (CMBSS). From the frequency representation of the "overlap and save", a Weighted General Discrete Fourier Transform...This letter deals with the frequency domain Blind Source Separation of Convolutive Mixtures (CMBSS). From the frequency representation of the "overlap and save", a Weighted General Discrete Fourier Transform (WGDFT) is derived to replace the traditional Discrete Fourier Transform (DFT). The mixing matrix on each frequency bin could be estimated more precisely from WGDFT coefficients than from DFT coefficients, which improves separation performance. Simulation results verify the validity of WGDFT for frequency domain blind source separation of convolutive mixtures.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
In this paper,a Maximum Likelihood(ML) approach,implemented by Expectation-Maximization(EM) algorithm,is proposed to blind separation of convolutively mixed discrete sources.In order to carry out the expectation proce...In this paper,a Maximum Likelihood(ML) approach,implemented by Expectation-Maximization(EM) algorithm,is proposed to blind separation of convolutively mixed discrete sources.In order to carry out the expectation procedure of the EM algorithm with a less computational load,the algorithm named Iterative Maximum Likelihood algorithm(IML) is proposed to calculate the likelihood and recover the source signals.An important feature of the ML approach is that it has robust performance in noise environments by treating the covariance matrix of the additive Gaussian noise as a parameter.Another striking feature of the ML approach is that it is possible to separate more sources than sensors by exploiting the finite alphabet property of the sources.Simulation results show that the proposed ML approach works well either in determined mixtures or underdetermined mixtures.Furthermore,the performance of the proposed ML algorithm is close to the performance with perfect knowledge of the channel filters.展开更多
Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual stati...Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual statistically dependent signals. When the observations are nonnegative linear combinations of nonnegative sources, the correlation coefficients of the observations are larger than these of source signals. In this letter, a novel Nonnegative Matrix Factorization (NMF) algorithm with least correlated component constraints to blind separation of convolutive mixed sources is proposed. The algorithm relaxes the source independence assumption and has low-complexity algebraic com- putations. Simulation results on blind source separation including real face image data indicate that the sources can be successfully recovered with the algorithm.展开更多
Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classifi...Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classification.However,the depth kernel mapping support vector machine does not take into account the connection of different dimensional spaces and increases the model parameters.To further improve the recognition capability of deep kernel mapping support vector machines while reducing the number of model parameters,this paper proposes a framework of Lightweight Deep Convolutional Cross-Connected Kernel Mapping Support Vector Machines(LC-CKMSVM).The framework consists of a feature extraction module and a classification module.The feature extraction module first maps the data from low-dimensional to high-dimensional space by fusing the representations of different dimensional spaces through cross-connections;then,it uses depthwise separable convolution to replace part of the original convolution to reduce the number of parameters in the module;The classification module uses a soft margin support vector machine for classification.The results on 6 different visual datasets show that LC-CKMSVM obtains better classification accuracies on most cases than the other five models.展开更多
Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on ...Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on the pruning of standard convolutional networks,and they rely intensively on time-consuming fine-tuning to achieve the performance improvement.To this end,we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks.Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration.A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning.We apply the proposed method to five representative deep learning networks,namely MobileNetV1,MobileNetV2,ShuffleNetV1,ShuffleNetV2,and GhostNet,to demonstrate the efficiency of our pruning method.Extensive experimental results and comparisons on publicly available CIFAR10,CIFAR100,and ImageNet datasets validate the feasibility of the proposed method.展开更多
Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolut...Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolution paradigm to the depthwise separable convolution(DSC)paradigm.However,there are not enough features to learn for identify-ing the freshness of fish eyes.Furthermore,minor variances in features should not require complicated CNN architecture.In this paper,our first contribution proposed DSC Bottle-neck with Expansion for learning features of the freshness of fish eyes with a Bottleneck Multiplier.The second contribution proposed Residual Transition to bridge current feature maps and skip connection feature maps to the next convolution block.The third contribu-tion proposed MobileNetV1 Bottleneck with Expansion(MB-BE)for classifying the freshness of fish eyes.The result obtained from the Freshness of the Fish Eyes dataset shows that MB-BE outperformed other models such as original MobileNet,VGG16,Densenet,Nasnet Mobile with 63.21%accuracy.展开更多
Blind source separation is a signal processing method based on independent component analysis, its aim is to separate the source signals from a set of observations (output of sensors) by assuming the source signals in...Blind source separation is a signal processing method based on independent component analysis, its aim is to separate the source signals from a set of observations (output of sensors) by assuming the source signals independently. This paper reviews the general concept of BSS firstly;especially the theory for convolutive mixtures, the model of convolutive mixture and two deconvolution structures, then adopts a BSS algorithm for convolutive mixtures based on residual cross-talking error threshold control criteria, the simulation testing points out good performance for simulated mixtures.展开更多
Traditional separation methods have limited ability to handle the speech separation problem in high reverberant and low signal-to-noise ratio(SNR)environments,and thus achieve unsatisfactory results.In this study,a co...Traditional separation methods have limited ability to handle the speech separation problem in high reverberant and low signal-to-noise ratio(SNR)environments,and thus achieve unsatisfactory results.In this study,a convolutional neural network with temporal convolution and residual network(TC-ResNet)is proposed to realize speech separation in a complex acoustic environment.A simplified steered-response power phase transform,denoted as GSRP-PHAT,is employed to reduce the computational cost.The extracted features are reshaped to a special tensor as the system inputs and implements temporal convolution,which not only enlarges the receptive field of the convolution layer but also significantly reduces the network computational cost.Residual blocks are used to combine multiresolution features and accelerate the training procedure.A modified ideal ratio mask is applied as the training target.Simulation results demonstrate that the proposed microphone array speech separation algorithm based on TC-ResNet achieves a better performance in terms of distortion ratio,source-to-interference ratio,and short-time objective intelligibility in low SNR and high reverberant environments,particularly in untrained situations.This indicates that the proposed method has generalization to untrained conditions.展开更多
基金supported by the Fundamental Research Funds for Central Universities(No.2024JCCXJD01).
文摘The distinctive fault characteristics of battery energy storage stations(BESSs)significantly affect the reliability of conventional protection methods for transmission lines.In this paper,the three-dimensional(3D)data scattergrams are constructed using current data from both sides of the transmission line and their sum.Following a comprehensive analysis of the varying characteristics of 3D data scattergrams under different conditions,a 3D data scattergram image classification based protection method is developed.The depth-wise separable convolution is used to ensure a lightweight convolutional neural network(CNN)structure without compromising performance.In addition,a Bayesian hyperparameter optimization algorithm is used to achieve a hyperparametric search to simplify the training process.Compared with artificial neural networks and CNNs,the depth-wise separable convolution based CNN(DPCNN)achieves a higher recognition accuracy.The 3D data scattergram image classification based protection method using DPCNN can accurately separate internal faults from other disturbances and identify fault phases under different operating states and fault conditions.The proposed protection method also shows first-class tolerability against current transformer(CT)saturation and CT measurement errors.
基金supported by the National Natural Science Foundation of China(No.52277055).
文摘Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.
文摘One of the most obvious clinical reasons of dementia or The Behavioral and Psychological Symptoms of Dementia(BPSD)are the lack of emotional expression,the increased frequency of negative emotions,and the impermanence of emotions.Observing the reduction of BPSD in dementia through emotions can be considered effective and widely used in the field of non-pharmacological therapy.At present,this article will verify whether the image recognition artificial intelligence(AI)system can correctly reflect the emotional performance of the elderly with dementia through a questionnaire survey of three professional elderly nursing staff.The ANOVA(sig.=0.50)is used to determine that the judgment given by the nursing staff has no obvious deviation,and then Kendall's test(0.722**)and spearman's test(0.863**)are used to verify the judgment severity of the emotion recognition system and the nursing staff unanimously.This implies the usability of the tool.Additionally,it can be expected to be further applied in the research related to BPSD elderly emotion detection.
基金This work was supported by Liaoning Provincial Science Public Welfare Research Fund Project(No.2016002006)Liaoning Provincial Department of Education Scientific Research Service Local Project(No.L201708).
文摘Recently,video-based fire detection technology has become an important research topic in the field of machine vision.This paper proposes a method of combining the classification model and target detection model in deep learning for fire detection.Firstly,the depthwise separable convolution is used to classify fire images,which saves a lot of detection time under the premise of ensuring detection accuracy.Secondly,You Only Look Once version 3(YOLOv3)target regression function is used to output the fire position information for the images whose classification result is fire,which avoids the problem that the accuracy of detection cannot be guaranteed by using YOLOv3 for target classification and position regression.At the same time,the detection time of target regression for images without fire is greatly reduced saved.The experiments were tested using a network public database.The detection accuracy reached 98%and the detection rate reached 38fps.This method not only saves the workload of manually extracting flame characteristics,reduces the calculation cost,and reduces the amount of parameters,but also improves the detection accuracy and detection rate.
基金Supported by the National Natural Science Foundation of China(61903336,61976190)the Natural Science Foundation of Zhejiang Province(LY21F030015)。
文摘Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.
文摘In order to prevent possible casualties and economic loss, it is critical to accurate prediction of the Remaining Useful Life (RUL) in rail prognostics health management. However, the traditional neural networks is difficult to capture the long-term dependency relationship of the time series in the modeling of the long time series of rail damage, due to the coupling relationship of multi-channel data from multiple sensors. Here, in this paper, a novel RUL prediction model with an enhanced pulse separable convolution is used to solve this issue. Firstly, a coding module based on the improved pulse separable convolutional network is established to effectively model the relationship between the data. To enhance the network, an alternate gradient back propagation method is implemented. And an efficient channel attention (ECA) mechanism is developed for better emphasizing the useful pulse characteristics. Secondly, an optimized Transformer encoder was designed to serve as the backbone of the model. It has the ability to efficiently understand relationship between the data itself and each other at each time step of long time series with a full life cycle. More importantly, the Transformer encoder is improved by integrating pulse maximum pooling to retain more pulse timing characteristics. Finally, based on the characteristics of the front layer, the final predicted RUL value was provided and served as the end-to-end solution. The empirical findings validate the efficacy of the suggested approach in forecasting the rail RUL, surpassing various existing data-driven prognostication techniques. Meanwhile, the proposed method also shows good generalization performance on PHM2012 bearing data set.
基金the National Natural Science Foundation of China under Grant No.52274159 received by E.Hu,https://www.nsfc.gov.cn/Grant No.52374165 received by E.Hu,https://www.nsfc.gov.cn/the China National Coal Group Key Technology Project Grant No.(20221CY001)received by Z.Guan,and E.Hu,https://www.chinacoal.com/.
文摘In the coal mining industry,the gangue separation phase imposes a key challenge due to the high visual similaritybetween coal and gangue.Recently,separation methods have become more intelligent and efficient,using newtechnologies and applying different features for recognition.One such method exploits the difference in substancedensity,leading to excellent coal/gangue recognition.Therefore,this study uses density differences to distinguishcoal from gangue by performing volume prediction on the samples.Our training samples maintain a record of3-side images as input,volume,and weight as the ground truth for the classification.The prediction process relieson a Convolutional neural network(CGVP-CNN)model that receives an input of a 3-side image and then extractsthe needed features to estimate an approximation for the volume.The classification was comparatively performedvia ten different classifiers,namely,K-Nearest Neighbors(KNN),Linear Support Vector Machines(Linear SVM),Radial Basis Function(RBF)SVM,Gaussian Process,Decision Tree,Random Forest,Multi-Layer Perceptron(MLP),Adaptive Boosting(AdaBosst),Naive Bayes,and Quadratic Discriminant Analysis(QDA).After severalexperiments on testing and training data,results yield a classification accuracy of 100%,92%,95%,96%,100%,100%,100%,96%,81%,and 92%,respectively.The test reveals the best timing with KNN,which maintained anaccuracy level of 100%.Assessing themodel generalization capability to newdata is essential to ensure the efficiencyof the model,so by applying a cross-validation experiment,the model generalization was measured.The useddataset was isolated based on the volume values to ensure the model generalization not only on new images of thesame volume but with a volume outside the trained range.Then,the predicted volume values were passed to theclassifiers group,where classification reported accuracy was found to be(100%,100%,100%,98%,88%,87%,100%,87%,97%,100%),respectively.Although obtaining a classification with high accuracy is the main motive,this workhas a remarkable reduction in the data preprocessing time compared to related works.The CGVP-CNN modelmanaged to reduce the data preprocessing time of previous works to 0.017 s while maintaining high classificationaccuracy using the estimated volume value.
基金supported by the Hunan Provincial Natural Science Foundation of China(2021JJ50074)the Scientific Research Fund of Hunan Provincial Education Department(19B082)+6 种基金the Science and Technology Development Center of the Ministry of Education-New Generation Information Technology Innovation Project(2018A02020)the Science Foundation of Hengyang Normal University(19QD12)the Science and Technology Plan Project of Hunan Province(2016TP1020)the Subject Group Construction Project of Hengyang Normal University(18XKQ02)theApplication Oriented SpecialDisciplines,Double First ClassUniversity Project of Hunan Province(Xiangjiaotong[2018]469)the Hunan Province Special Funds of Central Government for Guiding Local Science and Technology Development(2018CT5001)the First Class Undergraduate Major in Hunan Province Internet of Things Major(Xiangjiaotong[2020]248,No.288).
文摘The accurate and automatic segmentation of retinal vessels fromfundus images is critical for the early diagnosis and prevention ofmany eye diseases,such as diabetic retinopathy(DR).Existing retinal vessel segmentation approaches based on convolutional neural networks(CNNs)have achieved remarkable effectiveness.Here,we extend a retinal vessel segmentation model with low complexity and high performance based on U-Net,which is one of the most popular architectures.In view of the excellent work of depth-wise separable convolution,we introduce it to replace the standard convolutional layer.The complexity of the proposed model is reduced by decreasing the number of parameters and calculations required for themodel.To ensure performance while lowering redundant parameters,we integrate the pre-trained MobileNet V2 into the encoder.Then,a feature fusion residual module(FFRM)is designed to facilitate complementary strengths by enhancing the effective fusion between adjacent levels,which alleviates extraneous clutter introduced by direct fusion.Finally,we provide detailed comparisons between the proposed SepFE and U-Net in three retinal image mainstream datasets(DRIVE,STARE,and CHASEDB1).The results show that the number of SepFE parameters is only 3%of U-Net,the Flops are only 8%of U-Net,and better segmentation performance is obtained.The superiority of SepFE is further demonstrated through comparisons with other advanced methods.
基金partially supported by the Henan Provincial Key Research and Promotion Projects(Grant No.:242102211012)the Ministry of Education in China Project of Humanities and Social Sciences(Grant No.:24YJCZH261).
文摘Introduction:Accurate prediction of protocadherin 8(PCDH8)gene expression status from whole-slide images(WSIs)is critical for thyroid cancer diagnosis and prognosis,as PCDH8 overexpression is associated with tumor aggressiveness and poor outcomes.Existing methods for PCDH8 detection are often costly,time-consuming,or require specialized expertise.To address these limitations,we developed a novel depth-wise separable residual neural network(DSRNet)for noninvasive PCDH8 status prediction directly from WSIs.Materials and methods:We collected 403 thyroid cancer WSIs from The Cancer Genome Atlas(TCGA),with PCDH8 expression status classified as high or low based on median expression values.Each WSI was divided into 512×512 pixel tiles,with the top 100 non-white tiles selected per slide.DSRNet integrates depth-wise separable convolutions,residual connections,and a deformable convolutional pyramid pooling module to efficiently capture multiscale and long-range features in gigapixel WSIs.The model was trained using tenfold cross-validation.Results:DSRNet achieved state-of-the-art performance with 92.76%accuracy,91.92%precision,92.69%recall,and 0.93 area under the curve on the thyroid cancer dataset(TCGA-THCA),significantly outperforming leading convolutional neural networks and Transformer models.Ablation studies confirmed the contributions of each component,and attention visualization showed that DSRNet focuses on biologically relevant regions.The model also generalized well to a breast cancer dataset(TCGA-BRCA),achieving 89.13%accuracy.Conclusions:We developed DSRNet,a deep learning-based model for predicting PCDH8 status directly from routine hematoxylin and eosin-stained pathological images.DSRNet combines the efficiency of convolutional operations with enhanced long-range dependency modeling,providing a noninvasive,accurate,and interpretable tool for auxiliary thyroid cancer diagnosis and prognosis.The results demonstrate its strong potential for clinical translation,though further multicenter validation is warranted.
基金the grant from the Ph.D. Programs Foun-dation of Ministry of Education of China (No. 20060280003)the Shanghai Leading Academic Dis-cipline Project (Project No.T0102).
文摘This letter deals with the frequency domain Blind Source Separation of Convolutive Mixtures (CMBSS). From the frequency representation of the "overlap and save", a Weighted General Discrete Fourier Transform (WGDFT) is derived to replace the traditional Discrete Fourier Transform (DFT). The mixing matrix on each frequency bin could be estimated more precisely from WGDFT coefficients than from DFT coefficients, which improves separation performance. Simulation results verify the validity of WGDFT for frequency domain blind source separation of convolutive mixtures.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supportedin part by the National Natural Science Foundation of China under Grant No. 61001106the National Key Basic Research Program of China(973 Program) under Grant No. 2009CB320400
文摘In this paper,a Maximum Likelihood(ML) approach,implemented by Expectation-Maximization(EM) algorithm,is proposed to blind separation of convolutively mixed discrete sources.In order to carry out the expectation procedure of the EM algorithm with a less computational load,the algorithm named Iterative Maximum Likelihood algorithm(IML) is proposed to calculate the likelihood and recover the source signals.An important feature of the ML approach is that it has robust performance in noise environments by treating the covariance matrix of the additive Gaussian noise as a parameter.Another striking feature of the ML approach is that it is possible to separate more sources than sensors by exploiting the finite alphabet property of the sources.Simulation results show that the proposed ML approach works well either in determined mixtures or underdetermined mixtures.Furthermore,the performance of the proposed ML algorithm is close to the performance with perfect knowledge of the channel filters.
基金Supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (No.20060280003)Shanghai Leading Academic Dis-cipline Project (T0102)
文摘Most of the existing algorithms for blind sources separation have a limitation that sources are statistically independent. However, in many practical applications, the source signals are non- negative and mutual statistically dependent signals. When the observations are nonnegative linear combinations of nonnegative sources, the correlation coefficients of the observations are larger than these of source signals. In this letter, a novel Nonnegative Matrix Factorization (NMF) algorithm with least correlated component constraints to blind separation of convolutive mixed sources is proposed. The algorithm relaxes the source independence assumption and has low-complexity algebraic com- putations. Simulation results on blind source separation including real face image data indicate that the sources can be successfully recovered with the algorithm.
基金This work is supported by the National Natural Science Foundation of China(61806013,61876010,61906005,62166002)General project of Science and Technology Plan of Beijing Municipal Education Commission(KM202110005028)+1 种基金Project of Interdisciplinary Research Institute of Beijing University of Technology(2021020101)International Research Cooperation Seed Fund of Beijing University of Technology(2021A01).
文摘Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classification.However,the depth kernel mapping support vector machine does not take into account the connection of different dimensional spaces and increases the model parameters.To further improve the recognition capability of deep kernel mapping support vector machines while reducing the number of model parameters,this paper proposes a framework of Lightweight Deep Convolutional Cross-Connected Kernel Mapping Support Vector Machines(LC-CKMSVM).The framework consists of a feature extraction module and a classification module.The feature extraction module first maps the data from low-dimensional to high-dimensional space by fusing the representations of different dimensional spaces through cross-connections;then,it uses depthwise separable convolution to replace part of the original convolution to reduce the number of parameters in the module;The classification module uses a soft margin support vector machine for classification.The results on 6 different visual datasets show that LC-CKMSVM obtains better classification accuracies on most cases than the other five models.
基金the National Natural Science Foundation of China under Grant Nos.62036010 and 62072340the Zhejiang Provincial Natural Science Foundation of China under Grant Nos.LZ21F020001 and LSZ19F020001the Open Project Program of the State Key Laboratory of CAD&CG,Zhejiang University under Grant No.A2220.
文摘Channel pruning can reduce memory consumption and running time with least performance damage,and is one of the most important techniques in network compression.However,existing channel pruning methods mainly focus on the pruning of standard convolutional networks,and they rely intensively on time-consuming fine-tuning to achieve the performance improvement.To this end,we present a novel efficient probability-based channel pruning method for depthwise separable convolutional networks.Our method leverages a new simple yet effective probability-based channel pruning criterion by taking the scaling and shifting factors of batch normalization layers into consideration.A novel shifting factor fusion technique is further developed to improve the performance of the pruned networks without requiring extra time-consuming fine-tuning.We apply the proposed method to five representative deep learning networks,namely MobileNetV1,MobileNetV2,ShuffleNetV1,ShuffleNetV2,and GhostNet,to demonstrate the efficiency of our pruning method.Extensive experimental results and comparisons on publicly available CIFAR10,CIFAR100,and ImageNet datasets validate the feasibility of the proposed method.
文摘Image classification using Convolutional Neural Network(CNN)achieves optimal perfor-mance with a particular strategy.MobileNet reduces the parameter number for learning features by switching from the standard convolution paradigm to the depthwise separable convolution(DSC)paradigm.However,there are not enough features to learn for identify-ing the freshness of fish eyes.Furthermore,minor variances in features should not require complicated CNN architecture.In this paper,our first contribution proposed DSC Bottle-neck with Expansion for learning features of the freshness of fish eyes with a Bottleneck Multiplier.The second contribution proposed Residual Transition to bridge current feature maps and skip connection feature maps to the next convolution block.The third contribu-tion proposed MobileNetV1 Bottleneck with Expansion(MB-BE)for classifying the freshness of fish eyes.The result obtained from the Freshness of the Fish Eyes dataset shows that MB-BE outperformed other models such as original MobileNet,VGG16,Densenet,Nasnet Mobile with 63.21%accuracy.
文摘Blind source separation is a signal processing method based on independent component analysis, its aim is to separate the source signals from a set of observations (output of sensors) by assuming the source signals independently. This paper reviews the general concept of BSS firstly;especially the theory for convolutive mixtures, the model of convolutive mixture and two deconvolution structures, then adopts a BSS algorithm for convolutive mixtures based on residual cross-talking error threshold control criteria, the simulation testing points out good performance for simulated mixtures.
基金This work is supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002the National Nature Science Foundation of China(NSFC)under Grant No.61571106.
文摘Traditional separation methods have limited ability to handle the speech separation problem in high reverberant and low signal-to-noise ratio(SNR)environments,and thus achieve unsatisfactory results.In this study,a convolutional neural network with temporal convolution and residual network(TC-ResNet)is proposed to realize speech separation in a complex acoustic environment.A simplified steered-response power phase transform,denoted as GSRP-PHAT,is employed to reduce the computational cost.The extracted features are reshaped to a special tensor as the system inputs and implements temporal convolution,which not only enlarges the receptive field of the convolution layer but also significantly reduces the network computational cost.Residual blocks are used to combine multiresolution features and accelerate the training procedure.A modified ideal ratio mask is applied as the training target.Simulation results demonstrate that the proposed microphone array speech separation algorithm based on TC-ResNet achieves a better performance in terms of distortion ratio,source-to-interference ratio,and short-time objective intelligibility in low SNR and high reverberant environments,particularly in untrained situations.This indicates that the proposed method has generalization to untrained conditions.