Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest....Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.展开更多
Early fault detection for spiral bevel gears is crucial to ensure normal operation and prevent accidents.The harmonic components,excited by the time-varying mesh stiffness,always appear in measured vibration signal.Ho...Early fault detection for spiral bevel gears is crucial to ensure normal operation and prevent accidents.The harmonic components,excited by the time-varying mesh stiffness,always appear in measured vibration signal.How to extract the periodical impulses that indicate gear localized fault buried in the intensive noise and interfered by harmonics is a challenging task.In this paper,a novel Periodical Sparse-Assisted Decoupling(PSAD)method is proposed as an optimization problem to extract fault feature from noisy vibration signal.The PSAD method decouples the impulsive fault feature and harmonic components based on the sparse representation method.The sparsity within and across groups property and the periodicity of the fault feature are incorporated into the regularizer as the prior information.The nonconvex penalty is employed to highlight the sparsity of fault features.Meanwhile,the weight factor based on2norm of each group is constructed to strengthen the amplitude of fault feature.An iterative algorithm with Majorization-Minimization(MM)is derived to solve the optimization problem.Simulation study and experimental analysis confirm the performance of the proposed PSAD method in extracting and enhancing defect impulses from noisy signal.The suggested method surpasses other comparative methods in extracting and enhancing fault features.展开更多
Convex feasibility problems are widely used in image reconstruction, sparse signal recovery, and other areas. This paper is devoted to considering a class of convex feasibility problem arising from sparse signal recov...Convex feasibility problems are widely used in image reconstruction, sparse signal recovery, and other areas. This paper is devoted to considering a class of convex feasibility problem arising from sparse signal recovery. We first derive the projection formulas for a vector onto the feasible sets. The centralized circumcentered-reflection method is designed to solve the convex feasibility problem. Some numerical experiments demonstrate the feasibility and effectiveness of the proposed algorithm, showing superior performance compared to conventional alternating projection methods.展开更多
The internal flow fields within a three-dimensional inward-tunning combined inlet are extremely complex,especially during the engine mode transition,where the tunnel changes may impact the flow fields significantly.To...The internal flow fields within a three-dimensional inward-tunning combined inlet are extremely complex,especially during the engine mode transition,where the tunnel changes may impact the flow fields significantly.To develop an efficient flow field reconstruction model for this,we present an Improved Conditional Denoising Diffusion Generative Adversarial Network(ICDDGAN),which integrates Conditional Denoising Diffusion Probabilistic Models(CDDPMs)with Style GAN,and introduce a reconstruction discrimination mechanism and dynamic loss weight learning strategy.We establish the Mach number flow field dataset by numerical simulation at various backpressures for the mode transition process from turbine mode to ejector ramjet mode at Mach number 2.5.The proposed ICDDGAN model,given only sparse parameter information,can rapidly generate high-quality Mach number flow fields without a large number of samples for training.The results show that ICDDGAN is superior to CDDGAN in terms of training convergence and stability.Moreover,the interpolation and extrapolation test results during backpressure conditions show that ICDDGAN can accurately and quickly reconstruct Mach number fields at various tunnel slice shapes,with a Structural Similarity Index Measure(SSIM)of over 0.96 and a Mean-Square Error(MSE)of 0.035%to actual flow fields,reducing time costs by 7-8 orders of magnitude compared to Computational Fluid Dynamics(CFD)calculations.This can provide an efficient means for rapid computation of complex flow fields.展开更多
Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained promine...Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.展开更多
Located in northern China,the Hetao Plain is an important agro-economic zone and population centre.The deterioration of local groundwater quality has had a serious impact on human health and economic development.Nowad...Located in northern China,the Hetao Plain is an important agro-economic zone and population centre.The deterioration of local groundwater quality has had a serious impact on human health and economic development.Nowadays,the groundwater vulnerability assessment(GVA)has become an essential task to identify the current status and development trend of groundwater quality.In this study,the Convolutional Neural Network(CNN)and Long Short-Term Memory(LSTM)models are integrated to realize the spatio-temporal prediction of regional groundwater vulnerability by introducing the Self-attention mechanism.The study firstly builds the CNN-LSTM modelwith self-attention(SA)mechanism and evaluates the prediction accuracy of the model for groundwater vulnerability compared to other common machine learning models such as Support Vector Machine(SVM),Random Forest(RF),and Extreme Gradient Boosting(XGBoost).The results indicate that the CNNLSTM model outperforms thesemodels,demonstrating its significance in groundwater vulnerability assessment.It can be posited that the predictions indicate an increased risk of groundwater vulnerability in the study area over the coming years.This increase can be attributed to the synergistic impact of global climate anomalies and intensified local human activities.Moreover,the overall groundwater vulnerability risk in the entire region has increased,evident fromboth the notably high value and standard deviation.This suggests that the spatial variability of groundwater vulnerability in the area is expected to expand in the future due to the sustained progression of climate change and human activities.The model can be optimized for diverse applications across regional environmental assessment,pollution prediction,and risk statistics.This study holds particular significance for ecological protection and groundwater resource management.展开更多
In recent years,deeps learning has been widely applied in synthetic aperture radar(SAR)image processing.However,the collection of large-scale labeled SAR images is challenging and costly,and the classification accurac...In recent years,deeps learning has been widely applied in synthetic aperture radar(SAR)image processing.However,the collection of large-scale labeled SAR images is challenging and costly,and the classification accuracy is often poor when only limited SAR images are available.To address this issue,we propose a novel framework for sparse SAR target classification under few-shot cases,termed the transfer learning-based interpretable lightweight convolutional neural network(TL-IL-CNN).Additionally,we employ enhanced gradient-weighted class activation mapping(Grad-CAM)to mitigate the“black box”effect often associated with deep learning models and to explore the mechanisms by which a CNN classifies various sparse SAR targets.Initially,we apply a novel bidirectional iterative soft thresholding(BiIST)algorithm to generate sparse images of superior quality compared to those produced by traditional matched filtering(MF)techniques.Subsequently,we pretrain multiple shallow CNNs on a simulated SAR image dataset.Using the sparse SAR dataset as input for the CNNs,we assess the efficacy of transfer learning in sparse SAR target classification and suggest the integration of TL-IL-CNN to enhance the classification accuracy further.Finally,Grad-CAM is utilized to provide visual explanations for the predictions made by the classification framework.The experimental results on the MSTAR dataset reveal that the proposed TL-IL-CNN achieves nearly 90%classification accuracy with only 20%of the training data required under standard operating conditions(SOC),surpassing typical deep learning methods such as vision Transformer(ViT)in the context of small samples.Remarkably,it even presents better performance under extended operating conditions(EOC).Furthermore,the application of Grad-CAM elucidates the CNN’s differentiation process among various sparse SAR targets.The experiments indicate that the model focuses on the target and the background can differ among target classes.The study contributes to an enhanced understanding of the interpretability of such results and enables us to infer the classification outcomes for each category more accurately.展开更多
To realize effective co-phasing adjustment in large-aperture sparse-aperture telescopes,a multichannel stripe tracking approach is employed,allowing simultaneous interferometric measurements of multiple optical paths ...To realize effective co-phasing adjustment in large-aperture sparse-aperture telescopes,a multichannel stripe tracking approach is employed,allowing simultaneous interferometric measurements of multiple optical paths and circumventing the need for pairwise measurements along the mirror boundaries in traditional interferometric methods.This approach enhances detection efficiency and reduces system complexity.Here,the principles of the multibeam interference process and construction of a co-phasing detection module based on direct optical fiber connections were analyzed using wavefront optics theory.Error analysis was conducted on the system surface obtained through multipath interference.Potential applications of the interferometric method were explored.Finally,the principle was verified by experiment,an interferometric fringe contrast better than 0.4 is achieved through flat field calibration and incoherent digital synthesis.The dynamic range of the measurement exceeds 10 times of the center wavelength of the working band(1550 nm).Moreover,a resolution better than one-tenth of the working center wavelength(1550 nm)was achieved.Simultaneous three-beam interference can be achieved,leading to a 50%improvement in detection efficiency.This method can effectively enhance the efficiency of sparse aperture telescope co-phasing,meeting the requirements for observations of 8-10 m telescopes.This study provides a technological foundation for observing distant and faint celestial objects.展开更多
This paper explores the recovery of block sparse signals in frame-based settings using the l_(2)/l_(q)-synthesis technique(0<q≤1).We propose a new null space property,referred to as block D-NSP_(q),which is based ...This paper explores the recovery of block sparse signals in frame-based settings using the l_(2)/l_(q)-synthesis technique(0<q≤1).We propose a new null space property,referred to as block D-NSP_(q),which is based on the dictionary D.We establish that matrices adhering to the block D-NSP_(q)condition are both necessary and sufficient for the exact recovery of block sparse signals via l_(2)/l_(q)-synthesis.Additionally,this condition is essential for the stable recovery of signals that are block-compressible with respect to D.This D-NSP_(q)property is identified as the first complete condition for successful signal recovery using l_(2)/l_(q)-synthesis.Furthermore,we assess the theoretical efficacy of the l2/lq-synthesis method under conditions of measurement noise.展开更多
LetΩbe homogeneous of degree zero,integrable on S^(d−1) and have vanishing moment of order one,a be a function on R^(d) such that ∇a∈L^(∞)(R^(d)).Let T*_(Ω,a) be the maximaloperator associated with the d-dimensional...LetΩbe homogeneous of degree zero,integrable on S^(d−1) and have vanishing moment of order one,a be a function on R^(d) such that ∇a∈L^(∞)(R^(d)).Let T*_(Ω,a) be the maximaloperator associated with the d-dimensional Calder´on commutator defined by T*_(Ωa)f(x):=sup_(ε>0)|∫_(|x-y|>ε)^Ω(x-y)/|x-y|^(d+1)(a(x)-a(y))f(y)dy.In this paper,the authors establish bilinear sparse domination for T*_(Ω,a) under the assumption Ω∈L∞(Sd−1).As applications,some quantitative weighted bounds for T*_(Ω,a) are obtained.展开更多
In this paper,we focus on the recovery of piecewise sparse signals containing both fast-decaying and slow-decaying nonzero entries.In order to improve the performance of classic Orthogonal Matching Pursuit(OMP)and Gen...In this paper,we focus on the recovery of piecewise sparse signals containing both fast-decaying and slow-decaying nonzero entries.In order to improve the performance of classic Orthogonal Matching Pursuit(OMP)and Generalized Orthogonal Matching Pursuit(GOMP)algorithms for solving this problem,we propose the Piecewise Generalized Orthogonal Matching Pursuit(PGOMP)algorithm,by considering the mixed-decaying sparse signals as piecewise sparse signals with two components containing nonzero entries with different decay factors.The algorithm incorporates piecewise selection and deletion to retain the most significant entries according to the sparsity of each component.We provide a theoretical analysis based on the mutual coherence of the measurement matrix and the decay factors of the nonzero entries,establishing a sufficient condition for the PGOMP algorithm to select at least two correct indices in each iteration.Numerical simulations and an image decomposition experiment demonstrate that the proposed algorithm significantly improves the support recovery probability by effectively matching piecewise sparsity with decay factors.展开更多
Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by le...Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.展开更多
Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuato...Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuators.Existing methods for fitting hysteresis loops include operator class,differential equation class,and machine learning class.The modeling cost of operator class and differential equation class methods is high,the model complexity is high,and the process of machine learning,such as neural network calculation,is opaque.The physical model framework cannot be directly extracted.Therefore,the sparse identification of nonlinear dynamics(SINDy)algorithm is proposed to fit hysteresis loops.Furthermore,the SINDy algorithm is improved.While the SINDy algorithm builds an orthogonal candidate database for modeling,the sparse regression model is simplified,and the Relay operator is introduced for piecewise fitting to solve the distortion problem of the SINDy algorithm fitting singularities.The Relay-SINDy algorithm proposed in this paper is applied to fitting hysteresis loops.Good performance is obtained with the experimental results of open and closed loops.Compared with the existing methods,the modeling cost and model complexity are reduced,and the modeling accuracy of the hysteresis loop is improved.展开更多
In this paper,a sparse graph neural network-aided(SGNN-aided)decoder is proposed for improving the decoding performance of polar codes under bursty interference.Firstly,a sparse factor graph is constructed using the e...In this paper,a sparse graph neural network-aided(SGNN-aided)decoder is proposed for improving the decoding performance of polar codes under bursty interference.Firstly,a sparse factor graph is constructed using the encoding characteristic to achieve high-throughput polar decoding.To further improve the decoding performance,a residual gated bipartite graph neural network is designed for updating embedding vectors of heterogeneous nodes based on a bidirectional message passing neural network.This framework exploits gated recurrent units and residual blocks to address the gradient disappearance in deep graph recurrent neural networks.Finally,predictions are generated by feeding the embedding vectors into a readout module.Simulation results show that the proposed decoder is more robust than the existing ones in the presence of bursty interference and exhibits high universality.展开更多
A healthy brain is vital to every person since the brain controls every movement and emotion.Sometimes,some brain cells grow unexpectedly to be uncontrollable and cancerous.These cancerous cells are called brain tumor...A healthy brain is vital to every person since the brain controls every movement and emotion.Sometimes,some brain cells grow unexpectedly to be uncontrollable and cancerous.These cancerous cells are called brain tumors.For diagnosed patients,their lives depend mainly on the early diagnosis of these tumors to provide suitable treatment plans.Nowadays,Physicians and radiologists rely on Magnetic Resonance Imaging(MRI)pictures for their clinical evaluations of brain tumors.These evaluations are time-consuming,expensive,and require expertise with high skills to provide an accurate diagnosis.Scholars and industrials have recently partnered to implement automatic solutions to diagnose the disease with high accuracy.Due to their accuracy,some of these solutions depend on deep-learning(DL)methodologies.These techniques have become important due to their roles in the diagnosis process,which includes identification and classification.Therefore,there is a need for a solid and robust approach based on a deep-learning method to diagnose brain tumors.The purpose of this study is to develop an intelligent automatic framework for brain tumor diagnosis.The proposed solution is based on a novel dense dynamic residual self-attention transfer adaptive learning fusion approach(NDDRSATALFA),carried over two implemented deep-learning networks:VGG19 and UNET to identify and classify brain tumors.In addition,this solution applies a transfer learning approach to exchange extracted features and data within the two neural networks.The presented framework is trained,validated,and tested on six public datasets of MRIs to detect brain tumors and categorize these tumors into three suitable classes,which are glioma,meningioma,and pituitary.The proposed framework yielded remarkable findings on variously evaluated performance indicators:99.32%accuracy,98.74%sensitivity,98.89%specificity,99.01%Dice,98.93%Area Under the Curve(AUC),and 99.81%F1-score.In addition,a comparative analysis with recent state-of-the-art methods was performed and according to the comparative analysis,NDDRSATALFA shows an admirable level of reliability in simplifying the timely identification of diverse brain tumors.Moreover,this framework can be applied by healthcare providers to assist radiologists,pathologists,and physicians in their evaluations.The attained outcomes open doors for advanced automatic solutions that improve clinical evaluations and provide reasonable treatment plans.展开更多
Medical image analysis based on deep learning has become an important technical requirement in the field of smart healthcare.In view of the difficulties in collaborative modeling of local details and global features i...Medical image analysis based on deep learning has become an important technical requirement in the field of smart healthcare.In view of the difficulties in collaborative modeling of local details and global features in multimodal image analysis of ophthalmology,as well as the existence of information redundancy in cross-modal data fusion,this paper proposes amultimodal fusion framework based on cross-modal collaboration and weighted attention mechanism.In terms of feature extraction,the framework collaboratively extracts local fine-grained features and global structural dependencies through a parallel dual-branch architecture,overcoming the limitations of traditional single-modality models in capturing either local or global information;in terms of fusion strategy,the framework innovatively designs a cross-modal dynamic fusion strategy,combining overlappingmulti-head self-attention modules with a bidirectional feature alignment mechanism,addressing the bottlenecks of low feature interaction efficiency and excessive attention fusion computations in traditional parallel fusion,and further introduces cross-domain local integration technology,which enhances the representation ability of the lesion area through pixel-level feature recalibration and optimizes the diagnostic robustness of complex cases.Experiments show that the framework exhibits excellent feature expression and generalization performance in cross-domain scenarios of ophthalmic medical images and natural images,providing a high-precision,low-redundancy fusion paradigm for multimodal medical image analysis,and promoting the upgrade of intelligent diagnosis and treatment fromsingle-modal static analysis to dynamic decision-making.展开更多
Deep Learning-based systems for Finger vein recognition have gained rising attention in recent years due to improved efficiency and enhanced security.The performance of existing CNN-based methods is limited by the pun...Deep Learning-based systems for Finger vein recognition have gained rising attention in recent years due to improved efficiency and enhanced security.The performance of existing CNN-based methods is limited by the puny generalization of learned features and deficiency of the finger vein image training data.Considering the concerns of existing methods,in this work,a simplified deep transfer learning-based framework for finger-vein recognition is developed using an EfficientNet model of deep learning with a self-attention mechanism.Data augmentation using various geometrical methods is employed to address the problem of training data shortage required for a deep learning model.The proposed model is tested using K-fold cross-validation on three publicly available datasets:HKPU,FVUSM,and SDUMLA.Also,the developed network is compared with other modern deep nets to check its effectiveness.In addition,a comparison of the proposed method with other existing Finger vein recognition(FVR)methods is also done.The experimental results exhibited superior recognition accuracy of the proposed method compared to other existing methods.In addition,the developed method proves to be more effective and less sophisticated at extracting robust features.The proposed EffAttenNet achieves an accuracy of 98.14%on HKPU,99.03%on FVUSM,and 99.50%on SDUMLA databases.展开更多
Sparse identification of nonlinear dynamics(SINDy)has made significant progress in data-driven dynamics modeling.However,determining appropriate hyperparameters and addressing the time-consuming symbolic regression pr...Sparse identification of nonlinear dynamics(SINDy)has made significant progress in data-driven dynamics modeling.However,determining appropriate hyperparameters and addressing the time-consuming symbolic regression process remain substantial challenges.This study proposes the adaptive backward stepwise selection of fast SINDy(ABSS-FSINDy),which integrates statistical learning-based estimation and technical advancements to significantly reduce simulation time.This approach not only provides insights into the conditions under which SINDy performs optimally but also highlights potential failure points,particularly in the context of backward stepwise selection(BSS).By decoding predefined features into textual expressions,ABSS-FSINDy significantly reduces the simulation time compared with conventional symbolic regression methods.We validate the proposed method through a series of numerical experiments involving both planar/spatial dynamics and high-dimensional chaotic systems,including Lotka-Volterra,hyperchaotic Rossler,coupled Lorenz,and Lorenz 96 benchmark systems.The experimental results demonstrate that ABSS-FSINDy autonomously determines optimal hyperparameters within the SINDy framework,overcoming the curse of dimensionality in high-dimensional simulations.This improvement is substantial across both lowand high-dimensional systems,yielding efficiency gains of one to three orders of magnitude.For instance,in a 20D dynamical system,the simulation time is reduced from 107.63 s to just 0.093 s,resulting in a 3-order-of-magnitude improvement in simulation efficiency.This advancement broadens the applicability of SINDy for the identification and reconstruction of high-dimensional dynamical systems.展开更多
Deblending is a data processing procedure used to separate the source interferences of blended seismic data,which are obtained by simultaneous sources with random time delays to reduce the cost of seismic acquisition....Deblending is a data processing procedure used to separate the source interferences of blended seismic data,which are obtained by simultaneous sources with random time delays to reduce the cost of seismic acquisition.There are three types of deblending algorithms,i.e.,filtering-type noise suppression algorithm,inversion-based algorithm and deep-learning based algorithm.We review the merits of these techniques,and propose to use a sparse inversion method for seismic data deblending.Filtering-based deblending approach is applicable to blended data with a low blending fold and simple geometry.Otherwise,it can suffer from signal distortion and noise leakage.At present,the deep learning based deblending methods are still under development and field data applications are limited due to the lack of high-quality training labels.In contrast,the inversion-based deblending approaches have gained industrial acceptance.Our used inversion approach transforms the pseudo-deblended data into the frequency-wavenumber-wavenumher(FKK)domain,and a sparse constraint is imposed for the coherent signal estimation.The estimated signal is used to predict the interference noise for subtraction from the original pseudo-deblended data.Via minimizing the data misfit,the signal can be iteratively updated with a shrinking threshold until the signal and interference are fully separated.The used FKK sparse inversion algorithm is very accurate and efficient compared with other sparse inversion methods,and it is widely applied in field cases.Synthetic example shows that the deblending error is less than 1%in average amplitudes and less than-40 dB in amplitude spectra.We present three field data examples of land,marine OBN(Ocean Bottom Nodes)and streamer acquisitions to demonstrate its successful applications in separating the source interferences efficiently and accurately.展开更多
基金supported in part by the National Natural Science Foundation of China:61773330.
文摘Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.
基金supported by the National Science Foundationof China(Nos.52305127 and 52475130)。
文摘Early fault detection for spiral bevel gears is crucial to ensure normal operation and prevent accidents.The harmonic components,excited by the time-varying mesh stiffness,always appear in measured vibration signal.How to extract the periodical impulses that indicate gear localized fault buried in the intensive noise and interfered by harmonics is a challenging task.In this paper,a novel Periodical Sparse-Assisted Decoupling(PSAD)method is proposed as an optimization problem to extract fault feature from noisy vibration signal.The PSAD method decouples the impulsive fault feature and harmonic components based on the sparse representation method.The sparsity within and across groups property and the periodicity of the fault feature are incorporated into the regularizer as the prior information.The nonconvex penalty is employed to highlight the sparsity of fault features.Meanwhile,the weight factor based on2norm of each group is constructed to strengthen the amplitude of fault feature.An iterative algorithm with Majorization-Minimization(MM)is derived to solve the optimization problem.Simulation study and experimental analysis confirm the performance of the proposed PSAD method in extracting and enhancing defect impulses from noisy signal.The suggested method surpasses other comparative methods in extracting and enhancing fault features.
基金Supported by the Natural Science Foundation of Guangxi Province(Grant Nos.2023GXNSFAA026067,2024GXN SFAA010521)the National Natural Science Foundation of China(Nos.12361079,12201149,12261026).
文摘Convex feasibility problems are widely used in image reconstruction, sparse signal recovery, and other areas. This paper is devoted to considering a class of convex feasibility problem arising from sparse signal recovery. We first derive the projection formulas for a vector onto the feasible sets. The centralized circumcentered-reflection method is designed to solve the convex feasibility problem. Some numerical experiments demonstrate the feasibility and effectiveness of the proposed algorithm, showing superior performance compared to conventional alternating projection methods.
文摘The internal flow fields within a three-dimensional inward-tunning combined inlet are extremely complex,especially during the engine mode transition,where the tunnel changes may impact the flow fields significantly.To develop an efficient flow field reconstruction model for this,we present an Improved Conditional Denoising Diffusion Generative Adversarial Network(ICDDGAN),which integrates Conditional Denoising Diffusion Probabilistic Models(CDDPMs)with Style GAN,and introduce a reconstruction discrimination mechanism and dynamic loss weight learning strategy.We establish the Mach number flow field dataset by numerical simulation at various backpressures for the mode transition process from turbine mode to ejector ramjet mode at Mach number 2.5.The proposed ICDDGAN model,given only sparse parameter information,can rapidly generate high-quality Mach number flow fields without a large number of samples for training.The results show that ICDDGAN is superior to CDDGAN in terms of training convergence and stability.Moreover,the interpolation and extrapolation test results during backpressure conditions show that ICDDGAN can accurately and quickly reconstruct Mach number fields at various tunnel slice shapes,with a Structural Similarity Index Measure(SSIM)of over 0.96 and a Mean-Square Error(MSE)of 0.035%to actual flow fields,reducing time costs by 7-8 orders of magnitude compared to Computational Fluid Dynamics(CFD)calculations.This can provide an efficient means for rapid computation of complex flow fields.
基金supported by the National Natural Science Foundation of China(No.52277055).
文摘Traditional data-driven fault diagnosis methods depend on expert experience to manually extract effective fault features of signals,which has certain limitations.Conversely,deep learning techniques have gained prominence as a central focus of research in the field of fault diagnosis by strong fault feature extraction ability and end-to-end fault diagnosis efficiency.Recently,utilizing the respective advantages of convolution neural network(CNN)and Transformer in local and global feature extraction,research on cooperating the two have demonstrated promise in the field of fault diagnosis.However,the cross-channel convolution mechanism in CNN and the self-attention calculations in Transformer contribute to excessive complexity in the cooperative model.This complexity results in high computational costs and limited industrial applicability.To tackle the above challenges,this paper proposes a lightweight CNN-Transformer named as SEFormer for rotating machinery fault diagnosis.First,a separable multiscale depthwise convolution block is designed to extract and integrate multiscale feature information from different channel dimensions of vibration signals.Then,an efficient self-attention block is developed to capture critical fine-grained features of the signal from a global perspective.Finally,experimental results on the planetary gearbox dataset and themotor roller bearing dataset prove that the proposed framework can balance the advantages of robustness,generalization and lightweight compared to recent state-of-the-art fault diagnosis models based on CNN and Transformer.This study presents a feasible strategy for developing a lightweight rotating machinery fault diagnosis framework aimed at economical deployment.
基金supported by the National Key Research and Development Program of China(No.2021YFA0715900).
文摘Located in northern China,the Hetao Plain is an important agro-economic zone and population centre.The deterioration of local groundwater quality has had a serious impact on human health and economic development.Nowadays,the groundwater vulnerability assessment(GVA)has become an essential task to identify the current status and development trend of groundwater quality.In this study,the Convolutional Neural Network(CNN)and Long Short-Term Memory(LSTM)models are integrated to realize the spatio-temporal prediction of regional groundwater vulnerability by introducing the Self-attention mechanism.The study firstly builds the CNN-LSTM modelwith self-attention(SA)mechanism and evaluates the prediction accuracy of the model for groundwater vulnerability compared to other common machine learning models such as Support Vector Machine(SVM),Random Forest(RF),and Extreme Gradient Boosting(XGBoost).The results indicate that the CNNLSTM model outperforms thesemodels,demonstrating its significance in groundwater vulnerability assessment.It can be posited that the predictions indicate an increased risk of groundwater vulnerability in the study area over the coming years.This increase can be attributed to the synergistic impact of global climate anomalies and intensified local human activities.Moreover,the overall groundwater vulnerability risk in the entire region has increased,evident fromboth the notably high value and standard deviation.This suggests that the spatial variability of groundwater vulnerability in the area is expected to expand in the future due to the sustained progression of climate change and human activities.The model can be optimized for diverse applications across regional environmental assessment,pollution prediction,and risk statistics.This study holds particular significance for ecological protection and groundwater resource management.
基金supported in part by the National Natural Science Foundation(Nos.62271248,62401256)in part by the Natural Science Foundation of Ji-angsu Province(Nos.BK20230090,BK20241384)in part by the Key Laboratory of Land Satellite Remote Sens-ing Application,Ministry of Natural Resources of China(No.KLSMNR-K202303)。
文摘In recent years,deeps learning has been widely applied in synthetic aperture radar(SAR)image processing.However,the collection of large-scale labeled SAR images is challenging and costly,and the classification accuracy is often poor when only limited SAR images are available.To address this issue,we propose a novel framework for sparse SAR target classification under few-shot cases,termed the transfer learning-based interpretable lightweight convolutional neural network(TL-IL-CNN).Additionally,we employ enhanced gradient-weighted class activation mapping(Grad-CAM)to mitigate the“black box”effect often associated with deep learning models and to explore the mechanisms by which a CNN classifies various sparse SAR targets.Initially,we apply a novel bidirectional iterative soft thresholding(BiIST)algorithm to generate sparse images of superior quality compared to those produced by traditional matched filtering(MF)techniques.Subsequently,we pretrain multiple shallow CNNs on a simulated SAR image dataset.Using the sparse SAR dataset as input for the CNNs,we assess the efficacy of transfer learning in sparse SAR target classification and suggest the integration of TL-IL-CNN to enhance the classification accuracy further.Finally,Grad-CAM is utilized to provide visual explanations for the predictions made by the classification framework.The experimental results on the MSTAR dataset reveal that the proposed TL-IL-CNN achieves nearly 90%classification accuracy with only 20%of the training data required under standard operating conditions(SOC),surpassing typical deep learning methods such as vision Transformer(ViT)in the context of small samples.Remarkably,it even presents better performance under extended operating conditions(EOC).Furthermore,the application of Grad-CAM elucidates the CNN’s differentiation process among various sparse SAR targets.The experiments indicate that the model focuses on the target and the background can differ among target classes.The study contributes to an enhanced understanding of the interpretability of such results and enables us to infer the classification outcomes for each category more accurately.
文摘To realize effective co-phasing adjustment in large-aperture sparse-aperture telescopes,a multichannel stripe tracking approach is employed,allowing simultaneous interferometric measurements of multiple optical paths and circumventing the need for pairwise measurements along the mirror boundaries in traditional interferometric methods.This approach enhances detection efficiency and reduces system complexity.Here,the principles of the multibeam interference process and construction of a co-phasing detection module based on direct optical fiber connections were analyzed using wavefront optics theory.Error analysis was conducted on the system surface obtained through multipath interference.Potential applications of the interferometric method were explored.Finally,the principle was verified by experiment,an interferometric fringe contrast better than 0.4 is achieved through flat field calibration and incoherent digital synthesis.The dynamic range of the measurement exceeds 10 times of the center wavelength of the working band(1550 nm).Moreover,a resolution better than one-tenth of the working center wavelength(1550 nm)was achieved.Simultaneous three-beam interference can be achieved,leading to a 50%improvement in detection efficiency.This method can effectively enhance the efficiency of sparse aperture telescope co-phasing,meeting the requirements for observations of 8-10 m telescopes.This study provides a technological foundation for observing distant and faint celestial objects.
基金Supported by The Featured Innovation Projects of the General University of Guangdong Province(2023KTSCX096)The Special Projects in Key Areas of Guangdong Province(ZDZX1088)Research Team Project of Guangdong University of Education(2024KYCXTD018)。
文摘This paper explores the recovery of block sparse signals in frame-based settings using the l_(2)/l_(q)-synthesis technique(0<q≤1).We propose a new null space property,referred to as block D-NSP_(q),which is based on the dictionary D.We establish that matrices adhering to the block D-NSP_(q)condition are both necessary and sufficient for the exact recovery of block sparse signals via l_(2)/l_(q)-synthesis.Additionally,this condition is essential for the stable recovery of signals that are block-compressible with respect to D.This D-NSP_(q)property is identified as the first complete condition for successful signal recovery using l_(2)/l_(q)-synthesis.Furthermore,we assess the theoretical efficacy of the l2/lq-synthesis method under conditions of measurement noise.
文摘LetΩbe homogeneous of degree zero,integrable on S^(d−1) and have vanishing moment of order one,a be a function on R^(d) such that ∇a∈L^(∞)(R^(d)).Let T*_(Ω,a) be the maximaloperator associated with the d-dimensional Calder´on commutator defined by T*_(Ωa)f(x):=sup_(ε>0)|∫_(|x-y|>ε)^Ω(x-y)/|x-y|^(d+1)(a(x)-a(y))f(y)dy.In this paper,the authors establish bilinear sparse domination for T*_(Ω,a) under the assumption Ω∈L∞(Sd−1).As applications,some quantitative weighted bounds for T*_(Ω,a) are obtained.
基金Supported by the National Key R&D Program of China(Grant No.2023YFA1009200)the National Natural Science Foundation of China(Grant Nos.12271079+1 种基金12494552)the Fundamental Research Funds for the Central Universities of China(Grant No.DUT24LAB127)。
文摘In this paper,we focus on the recovery of piecewise sparse signals containing both fast-decaying and slow-decaying nonzero entries.In order to improve the performance of classic Orthogonal Matching Pursuit(OMP)and Generalized Orthogonal Matching Pursuit(GOMP)algorithms for solving this problem,we propose the Piecewise Generalized Orthogonal Matching Pursuit(PGOMP)algorithm,by considering the mixed-decaying sparse signals as piecewise sparse signals with two components containing nonzero entries with different decay factors.The algorithm incorporates piecewise selection and deletion to retain the most significant entries according to the sparsity of each component.We provide a theoretical analysis based on the mutual coherence of the measurement matrix and the decay factors of the nonzero entries,establishing a sufficient condition for the PGOMP algorithm to select at least two correct indices in each iteration.Numerical simulations and an image decomposition experiment demonstrate that the proposed algorithm significantly improves the support recovery probability by effectively matching piecewise sparsity with decay factors.
文摘Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.
基金National Natural Science Foundation of China(62203118)。
文摘Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuators.Existing methods for fitting hysteresis loops include operator class,differential equation class,and machine learning class.The modeling cost of operator class and differential equation class methods is high,the model complexity is high,and the process of machine learning,such as neural network calculation,is opaque.The physical model framework cannot be directly extracted.Therefore,the sparse identification of nonlinear dynamics(SINDy)algorithm is proposed to fit hysteresis loops.Furthermore,the SINDy algorithm is improved.While the SINDy algorithm builds an orthogonal candidate database for modeling,the sparse regression model is simplified,and the Relay operator is introduced for piecewise fitting to solve the distortion problem of the SINDy algorithm fitting singularities.The Relay-SINDy algorithm proposed in this paper is applied to fitting hysteresis loops.Good performance is obtained with the experimental results of open and closed loops.Compared with the existing methods,the modeling cost and model complexity are reduced,and the modeling accuracy of the hysteresis loop is improved.
文摘In this paper,a sparse graph neural network-aided(SGNN-aided)decoder is proposed for improving the decoding performance of polar codes under bursty interference.Firstly,a sparse factor graph is constructed using the encoding characteristic to achieve high-throughput polar decoding.To further improve the decoding performance,a residual gated bipartite graph neural network is designed for updating embedding vectors of heterogeneous nodes based on a bidirectional message passing neural network.This framework exploits gated recurrent units and residual blocks to address the gradient disappearance in deep graph recurrent neural networks.Finally,predictions are generated by feeding the embedding vectors into a readout module.Simulation results show that the proposed decoder is more robust than the existing ones in the presence of bursty interference and exhibits high universality.
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,Saudi Arabia under Grant No.(GPIP:1055-829-2024).
文摘A healthy brain is vital to every person since the brain controls every movement and emotion.Sometimes,some brain cells grow unexpectedly to be uncontrollable and cancerous.These cancerous cells are called brain tumors.For diagnosed patients,their lives depend mainly on the early diagnosis of these tumors to provide suitable treatment plans.Nowadays,Physicians and radiologists rely on Magnetic Resonance Imaging(MRI)pictures for their clinical evaluations of brain tumors.These evaluations are time-consuming,expensive,and require expertise with high skills to provide an accurate diagnosis.Scholars and industrials have recently partnered to implement automatic solutions to diagnose the disease with high accuracy.Due to their accuracy,some of these solutions depend on deep-learning(DL)methodologies.These techniques have become important due to their roles in the diagnosis process,which includes identification and classification.Therefore,there is a need for a solid and robust approach based on a deep-learning method to diagnose brain tumors.The purpose of this study is to develop an intelligent automatic framework for brain tumor diagnosis.The proposed solution is based on a novel dense dynamic residual self-attention transfer adaptive learning fusion approach(NDDRSATALFA),carried over two implemented deep-learning networks:VGG19 and UNET to identify and classify brain tumors.In addition,this solution applies a transfer learning approach to exchange extracted features and data within the two neural networks.The presented framework is trained,validated,and tested on six public datasets of MRIs to detect brain tumors and categorize these tumors into three suitable classes,which are glioma,meningioma,and pituitary.The proposed framework yielded remarkable findings on variously evaluated performance indicators:99.32%accuracy,98.74%sensitivity,98.89%specificity,99.01%Dice,98.93%Area Under the Curve(AUC),and 99.81%F1-score.In addition,a comparative analysis with recent state-of-the-art methods was performed and according to the comparative analysis,NDDRSATALFA shows an admirable level of reliability in simplifying the timely identification of diverse brain tumors.Moreover,this framework can be applied by healthcare providers to assist radiologists,pathologists,and physicians in their evaluations.The attained outcomes open doors for advanced automatic solutions that improve clinical evaluations and provide reasonable treatment plans.
基金funded by the Ongoing Research Funding Program(ORF-2025-102),King Saud University,Riyadh,Saudi Arabiaby the Science and Technology Research Programof Chongqing Municipal Education Commission(Grant No.KJQN202400813)by the Graduate Research Innovation Project(Grant Nos.yjscxx2025-269-193 and CYS25618).
文摘Medical image analysis based on deep learning has become an important technical requirement in the field of smart healthcare.In view of the difficulties in collaborative modeling of local details and global features in multimodal image analysis of ophthalmology,as well as the existence of information redundancy in cross-modal data fusion,this paper proposes amultimodal fusion framework based on cross-modal collaboration and weighted attention mechanism.In terms of feature extraction,the framework collaboratively extracts local fine-grained features and global structural dependencies through a parallel dual-branch architecture,overcoming the limitations of traditional single-modality models in capturing either local or global information;in terms of fusion strategy,the framework innovatively designs a cross-modal dynamic fusion strategy,combining overlappingmulti-head self-attention modules with a bidirectional feature alignment mechanism,addressing the bottlenecks of low feature interaction efficiency and excessive attention fusion computations in traditional parallel fusion,and further introduces cross-domain local integration technology,which enhances the representation ability of the lesion area through pixel-level feature recalibration and optimizes the diagnostic robustness of complex cases.Experiments show that the framework exhibits excellent feature expression and generalization performance in cross-domain scenarios of ophthalmic medical images and natural images,providing a high-precision,low-redundancy fusion paradigm for multimodal medical image analysis,and promoting the upgrade of intelligent diagnosis and treatment fromsingle-modal static analysis to dynamic decision-making.
文摘Deep Learning-based systems for Finger vein recognition have gained rising attention in recent years due to improved efficiency and enhanced security.The performance of existing CNN-based methods is limited by the puny generalization of learned features and deficiency of the finger vein image training data.Considering the concerns of existing methods,in this work,a simplified deep transfer learning-based framework for finger-vein recognition is developed using an EfficientNet model of deep learning with a self-attention mechanism.Data augmentation using various geometrical methods is employed to address the problem of training data shortage required for a deep learning model.The proposed model is tested using K-fold cross-validation on three publicly available datasets:HKPU,FVUSM,and SDUMLA.Also,the developed network is compared with other modern deep nets to check its effectiveness.In addition,a comparison of the proposed method with other existing Finger vein recognition(FVR)methods is also done.The experimental results exhibited superior recognition accuracy of the proposed method compared to other existing methods.In addition,the developed method proves to be more effective and less sophisticated at extracting robust features.The proposed EffAttenNet achieves an accuracy of 98.14%on HKPU,99.03%on FVUSM,and 99.50%on SDUMLA databases.
基金Project supported by the National Natural Science Foundation of China(Nos.12172291,12472357,and 12232015)the Shaanxi Province Outstanding Youth Fund Project(No.2024JC-JCQN-05)the 111 Project(No.BP0719007)。
文摘Sparse identification of nonlinear dynamics(SINDy)has made significant progress in data-driven dynamics modeling.However,determining appropriate hyperparameters and addressing the time-consuming symbolic regression process remain substantial challenges.This study proposes the adaptive backward stepwise selection of fast SINDy(ABSS-FSINDy),which integrates statistical learning-based estimation and technical advancements to significantly reduce simulation time.This approach not only provides insights into the conditions under which SINDy performs optimally but also highlights potential failure points,particularly in the context of backward stepwise selection(BSS).By decoding predefined features into textual expressions,ABSS-FSINDy significantly reduces the simulation time compared with conventional symbolic regression methods.We validate the proposed method through a series of numerical experiments involving both planar/spatial dynamics and high-dimensional chaotic systems,including Lotka-Volterra,hyperchaotic Rossler,coupled Lorenz,and Lorenz 96 benchmark systems.The experimental results demonstrate that ABSS-FSINDy autonomously determines optimal hyperparameters within the SINDy framework,overcoming the curse of dimensionality in high-dimensional simulations.This improvement is substantial across both lowand high-dimensional systems,yielding efficiency gains of one to three orders of magnitude.For instance,in a 20D dynamical system,the simulation time is reduced from 107.63 s to just 0.093 s,resulting in a 3-order-of-magnitude improvement in simulation efficiency.This advancement broadens the applicability of SINDy for the identification and reconstruction of high-dimensional dynamical systems.
基金supported by National Science and Technology Major Project(Grant No.2017ZX05018-001)。
文摘Deblending is a data processing procedure used to separate the source interferences of blended seismic data,which are obtained by simultaneous sources with random time delays to reduce the cost of seismic acquisition.There are three types of deblending algorithms,i.e.,filtering-type noise suppression algorithm,inversion-based algorithm and deep-learning based algorithm.We review the merits of these techniques,and propose to use a sparse inversion method for seismic data deblending.Filtering-based deblending approach is applicable to blended data with a low blending fold and simple geometry.Otherwise,it can suffer from signal distortion and noise leakage.At present,the deep learning based deblending methods are still under development and field data applications are limited due to the lack of high-quality training labels.In contrast,the inversion-based deblending approaches have gained industrial acceptance.Our used inversion approach transforms the pseudo-deblended data into the frequency-wavenumber-wavenumher(FKK)domain,and a sparse constraint is imposed for the coherent signal estimation.The estimated signal is used to predict the interference noise for subtraction from the original pseudo-deblended data.Via minimizing the data misfit,the signal can be iteratively updated with a shrinking threshold until the signal and interference are fully separated.The used FKK sparse inversion algorithm is very accurate and efficient compared with other sparse inversion methods,and it is widely applied in field cases.Synthetic example shows that the deblending error is less than 1%in average amplitudes and less than-40 dB in amplitude spectra.We present three field data examples of land,marine OBN(Ocean Bottom Nodes)and streamer acquisitions to demonstrate its successful applications in separating the source interferences efficiently and accurately.