This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
BACKGROUND Esophageal cancer is the sixth most common cancer worldwide,with a high mortality rate.Early prognosis of esophageal abnormalities can improve patient survival rates.The progression of esophageal cancer fol...BACKGROUND Esophageal cancer is the sixth most common cancer worldwide,with a high mortality rate.Early prognosis of esophageal abnormalities can improve patient survival rates.The progression of esophageal cancer follows a sequence from esophagitis to non-dysplastic Barrett’s esophagus,dysplastic Barrett’s esophagus,and eventually esophageal adenocarcinoma(EAC).This study explored the application of deep learning technology in the precise diagnosis of pathological classification and staging of EAC to enhance diagnostic accuracy and efficiency.AIM To explore the application of deep learning models,particularly Wave-Vision Transformer(Wave-ViT),in the pathological classification and staging of esophageal cancer to enhance diagnostic accuracy and efficiency.METHODS We applied several deep learning models,including multi-layer perceptron,residual network,transformer,and Wave-ViT,to a dataset of clinically validated esophageal pathology images.The models were trained to identify pathological features and assist in the classification and staging of different stages of esophageal cancer.The models were compared based on accuracy,computational complexity,and efficiency.RESULTS The Wave-ViT model demonstrated the highest accuracy at 88.97%,surpassing the transformer(87.65%),residual network(85.44%),and multi-layer perceptron(81.17%).Additionally,Wave-ViT exhibited low computational complexity with significantly reduced parameter size,making it highly efficient for real-time clinical applications.CONCLUSION Deep learning technology,particularly the Frequency-Domain Transformer model,shows promise in improving the precision of pathological classification and staging of EAC.The application of the Frequency-Domain Transformer model enhances the automation of the diagnostic process and may support early detection and treatment of EAC.Future research may further explore the potential of this model in broader medical image analysis applications,particularly in the field of precision medicine.展开更多
Efficient and accurate prediction of ocean surface latent heat fluxes is essential for understanding and modeling climate dynamics.Conventional estimation methods have low resolution and lack accuracy.The transformer ...Efficient and accurate prediction of ocean surface latent heat fluxes is essential for understanding and modeling climate dynamics.Conventional estimation methods have low resolution and lack accuracy.The transformer model,with its self-attention mechanism,effectively captures long-range dependencies,leading to a degradation of accuracy over time.Due to the non-linearity and uncertainty of physical processes,the transformer model encounters the problem of error accumulation,leading to a degradation of accuracy over time.To solve this problem,we combine the Data Assimilation(DA)technique with the transformer model and continuously modify the model state to make it closer to the actual observations.In this paper,we propose a deep learning model called TransNetDA,which integrates transformer,convolutional neural network and DA methods.By combining data-driven and DA methods for spatiotemporal prediction,TransNetDA effectively extracts multi-scale spatial features and significantly improves prediction accuracy.The experimental results indicate that the TransNetDA method surpasses traditional techniques in terms of root mean square error and R2 metrics,showcasing its superior performance in predicting latent heat fluxes at the ocean surface.展开更多
The issue of small-angle maneuvering targets inverse synthetic aperture radar(ISAR)imaging has been successfully addressed by popular motion compensation algorithms.However,when the target’s rotational velocity is su...The issue of small-angle maneuvering targets inverse synthetic aperture radar(ISAR)imaging has been successfully addressed by popular motion compensation algorithms.However,when the target’s rotational velocity is sufficiently high during the dwell time of the radar,such compensation algorithms cannot obtain a high quality image.This paper proposes an ISAR imaging algorithm based on keystone transform and deep learning algorithm.The keystone transform is used to coarsely compensate for the target’s rotational motion and translational motion,and the deep learning algorithm is used to achieve a super-resolution image.The uniformly distributed point target data are used as the data set of the training u-net network.In addition,this method does not require estimating the motion parameters of the target,which simplifies the algorithm steps.Finally,several experiments are performed to demonstrate the effectiveness of the proposed algorithm.展开更多
Many traditional denoising methods,such as Gaussian filtering,tend to blur and lose details or edge information while reducing noise.The stationary wavelet packet transform is a multi-scale and multi-band analysis too...Many traditional denoising methods,such as Gaussian filtering,tend to blur and lose details or edge information while reducing noise.The stationary wavelet packet transform is a multi-scale and multi-band analysis tool.Compared with the stationary wavelet transform,it can suppress high-frequency noise while preserving more edge details.Deep learning has significantly progressed in denoising applications.DnCNN,a residual network;FFDNet,an efficient,fl exible network;U-NET,a codec network;and GAN,a generative adversative network,have better denoising effects than BM3D,the most popular conventional denoising method.Therefore,SWP_hFFDNet,a random noise attenuation network based on the stationary wavelet packet transform(SWPT)and modified FFDNet,is proposed.This network combines the advantages of SWPT,Huber norm,and FFDNet.In addition,it has three characteristics:First,SWPT is an eff ective featureextraction tool that can obtain low-and high-frequency features of different scales and frequency bands.Second,because the noise level map is the input of the network,the noise removal performance of diff erent noise levels can be improved.Third,the Huber norm can reduce the sensitivity of the network to abnormal data and enhance its robustness.The network is trained using the Adam algorithm and the BSD500 dataset,which is augmented,noised,and decomposed by SWPT.Experimental and actual data processing results show that the denoising eff ect of the proposed method is almost the same as those of BM3D,DnCNN,and FFDNet networks for low noise.However,for high noise,the proposed method is superior to the aforementioned networks.展开更多
Arabic dialect identification is essential in Natural Language Processing(NLP)and forms a critical component of applications such as machine translation,sentiment analysis,and cross-language text generation.The diffic...Arabic dialect identification is essential in Natural Language Processing(NLP)and forms a critical component of applications such as machine translation,sentiment analysis,and cross-language text generation.The difficulties in differentiating between Arabic dialects have garnered more attention in the last 10 years,particularly in social media.These difficulties result from the overlapping vocabulary of the dialects,the fluidity of online language use,and the difficulties in telling apart dialects that are closely related.Managing dialects with limited resources and adjusting to the ever-changing linguistic trends on social media platforms present additional challenges.A strong dialect recognition technique is essential to improving communication technology and cross-cultural understanding in light of the increase in social media usage.To distinguish Arabic dialects on social media,this research suggests a hybrid Deep Learning(DL)approach.The Long Short-Term Memory(LSTM)and Bidirectional Long Short-Term Memory(BiLSTM)architectures make up the model.A new textual dataset that focuses on three main dialects,i.e.,Levantine,Saudi,and Egyptian,is also available.Approximately 11,000 user-generated comments from Twitter are included in this dataset,which has been painstakingly annotated to guarantee accuracy in dialect classification.Transformers,DL models,and basic machine learning classifiers are used to conduct several tests to evaluate the performance of the suggested model.Various methodologies,including TF-IDF,word embedding,and self-attention mechanisms,are used.The suggested model fares better than other models in terms of accuracy,obtaining a remarkable 96.54%,according to the trial results.This study advances the discipline by presenting a new dataset and putting forth a practical model for Arabic dialect identification.This model may prove crucial for future work in sociolinguistic studies and NLP.展开更多
Introduction Deep learning(DL),as one of the most transformative technologies in artificial intelligence(AI),is undergoing a pivotal transition from laboratory research to industrial deployment.Advancing at an unprece...Introduction Deep learning(DL),as one of the most transformative technologies in artificial intelligence(AI),is undergoing a pivotal transition from laboratory research to industrial deployment.Advancing at an unprecedented pace,DL is transcending theoretical and application boundaries to penetrate emerging realworld scenarios such as industrial automation,urban management,and health monitoring,thereby driving a new wave of intelligent transformation.In August 2023,Goldman Sachs estimated that global AI investment will reach US$200 billion by 2025[1].However,the increasing complexity and dynamic nature of application scenarios expose critical challenges in traditional deep learning,including data heterogeneity,insufficient model generalization,computational resource constraints,and privacy-security trade-offs.The next generation of deep learning methodologies needs to achieve breakthroughs in multimodal fusion,lightweight design,interpretability enhancement,and cross-disciplinary collaborative optimization,in order to develop more efficient,robust,and practically valuable intelligent systems.展开更多
Addressing plant diseases and pests is not just crucial;it’s a matter of utmost importance for enhancing crop production and preventing economic losses. Recent advancements in artificial intelligence, machine learnin...Addressing plant diseases and pests is not just crucial;it’s a matter of utmost importance for enhancing crop production and preventing economic losses. Recent advancements in artificial intelligence, machine learning, and deep learning have revolutionised the precision and efficiency of this process, surpassing the limitations of manual identification. This study comprehensively reviews modern computer-based techniques, including recent advances in artificial intelligence, for detecting diseases and pests through images. This paper uniquely categorises methodologies into hyperspectral imaging, non-visualisation techniques, visualisation approaches, modified deep learning architectures, and transformer models, helping researchers gain detailed, insightful understandings. The exhaustive survey of recent works and comparative studies in this domain guides researchers in selecting appropriate and advanced state-of-the-art methods for plant disease and pest detection. Additionally, this paper highlights the consistent superiority of modern AI-based approaches, which often outperform older image analysis methods in terms of speed and accuracy. Further, this survey focuses on the efficiency of vision transformers against well-known deep learning architectures like MobileNetV3, which shows that Hierarchical Vision Transformer (HvT) can achieve accuracy upwards of 99.3% in plant disease detection. The study concludes by addressing the challenges of designing the systems, proposing potential solutions, and outlining directions for future research in this field.展开更多
Forecasting landslide deformation is challenging due to influence of various internal and external factors on the occurrence of systemic and localized heterogeneities.Despite the potential to improve landslide predict...Forecasting landslide deformation is challenging due to influence of various internal and external factors on the occurrence of systemic and localized heterogeneities.Despite the potential to improve landslide predictability,deep learning has yet to be sufficiently explored for complex deformation patterns associated with landslides and is inherently opaque.Herein,we developed a holistic landslide deformation forecasting method that considers spatiotemporal correlations of landslide deformation by integrating domain knowledge into interpretable deep learning.By spatially capturing the interconnections between multiple deformations from different observation points,our method contributes to the understanding and forecasting of landslide systematic behavior.By integrating specific domain knowledge relevant to each observation point and merging internal properties with external variables,the local heterogeneity is considered in our method,identifying deformation temporal patterns in different landslide zones.Case studies involving reservoir-induced landslides and creeping landslides demonstrated that our approach(1)enhances the accuracy of landslide deformation forecasting,(2)identifies significant contributing factors and their influence on spatiotemporal deformation characteristics,and(3)demonstrates how identifying these factors and patterns facilitates landslide forecasting.Our research offers a promising and pragmatic pathway toward a deeper understanding and forecasting of complex landslide behaviors.展开更多
Deep learning(DL)has revolutionized time series forecasting(TSF),surpassing traditional statistical methods(e.g.,ARIMA)and machine learning techniques in modeling complex nonlinear dynamics and long-term dependencies ...Deep learning(DL)has revolutionized time series forecasting(TSF),surpassing traditional statistical methods(e.g.,ARIMA)and machine learning techniques in modeling complex nonlinear dynamics and long-term dependencies prevalent in real-world temporal data.This comprehensive survey reviews state-of-the-art DL architectures forTSF,focusing on four core paradigms:(1)ConvolutionalNeuralNetworks(CNNs),adept at extracting localized temporal features;(2)Recurrent Neural Networks(RNNs)and their advanced variants(LSTM,GRU),designed for sequential dependency modeling;(3)Graph Neural Networks(GNNs),specialized for forecasting structured relational data with spatial-temporal dependencies;and(4)Transformer-based models,leveraging self-attention mechanisms to capture global temporal patterns efficiently.We provide a rigorous analysis of the theoretical underpinnings,recent algorithmic advancements(e.g.,TCNs,attention mechanisms,hybrid architectures),and practical applications of each framework,supported by extensive benchmark datasets(e.g.,ETT,traffic flow,financial indicators)and standardized evaluation metrics(MAE,MSE,RMSE).Critical challenges,including handling irregular sampling intervals,integrating domain knowledge for robustness,and managing computational complexity,are thoroughly discussed.Emerging research directions highlighted include diffusion models for uncertainty quantification,hybrid pipelines combining classical statistical and DL techniques for enhanced interpretability,quantile regression with Transformers for riskaware forecasting,and optimizations for real-time deployment.This work serves as an essential reference,consolidating methodological innovations,empirical resources,and future trends to bridge the gap between theoretical research and practical implementation needs for researchers and practitioners in the field.展开更多
BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making ...BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making it crucial to develop deep learning methods to assist in image analysis.AIM To employ deep learning models for the automatic classification of small bowel lesions using pediatric VCE images.METHODS We retrospectively analyzed VCE images from 162 pediatric patients who underwent VCE between January 2021 and December 2023 at the Children's Hospital of Nanjing Medical University.A total of 2298 high-resolution images were extracted,including normal mucosa and lesions(erosions/erythema,ulcers,and polyps).The images were split into training and test datasets in a 4:1 ratio.Four deep learning models:DenseNet121,Visual geometry group-16,ResNet50,and vision transformer were trained using 5-fold cross-validation,with hyperparameters adjusted for optimal classification performance.The models were evaluated based on accuracy,precision,recall,F1-score,and area under the receiver operating curve(AU-ROC).Lesion visualization was performed using gradient-weighted class activation mapping.RESULTS Abdominal pain was the most common indication for VCE,accounting for 62%of cases,followed by diarrhea,vomiting,and gastrointestinal bleeding.Abnormal lesions were detected in 93 children,with 38 diagnosed with inflammatory bowel disease.Among the deep learning models,DenseNet121 and ResNet50 demonstrated excellent classification performance,achieving accuracies of 90.6%[95%confidence interval(CI):89.2-92.0]and 90.5%(95%CI:89.9-91.2),respectively.The AU-ROC values for these models were 93.7%(95%CI:92.9-94.5)for DenseNet121 and 93.4%(95%CI:93.1-93.8)for ResNet50.CONCLUSION Our deep learning-based diagnostic tool developed in this study effectively classified lesions in pediatric VCE images,contributing to more accurate diagnoses and increased diagnostic efficiency.展开更多
Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection m...Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection mechanisms is paramount.This research introduces a novel real-time deepfake video detection framework by analyzing gaze and blink patterns,addressing the spatial-temporal challenges unique to gaze and blink anomalies using the TimeSformer and hybrid Transformer-CNN models.The TimeSformer architecture leverages spatial-temporal attention mechanisms to capture fine-grained blinking intervals and gaze direction anomalies.Compared to state-of-the-art traditional convolutional models like MesoNet and EfficientNet,which primarily focus on global facial features,our approach emphasizes localized eye-region analysis,significantly enhancing detection accuracy.We evaluate our framework on four standard datasets:FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb.The proposed framework results reveal higher accuracy,with the TimeSformer model achieving accuracies of 97.5%,96.3%,95.8%,and 97.1%,and with the hybrid Transformer-CNN model demonstrating accuracies of 92.8%,91.5%,90.9%,and 93.2%,on FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb datasets,respectively,showing robustness in distinguishing manipulated from authentic videos.Our research provides a robust state-of-the-art framework for real-time deepfake video detection.This novel study significantly contributes to video forensics,presenting scalable and accurate real-world application solutions.展开更多
Coalbed methane(CBM)is a vital unconventional energy resource,and predicting its spatiotemporal pressure dynamics is crucial for efficient development strategies.This paper proposes a novel deep learningebased data-dr...Coalbed methane(CBM)is a vital unconventional energy resource,and predicting its spatiotemporal pressure dynamics is crucial for efficient development strategies.This paper proposes a novel deep learningebased data-driven surrogate model,AxialViT-ConvLSTM,which integrates AxialAttention Vision Transformer,ConvLSTM,and an enhanced loss function to predict pressure dynamics in CBM reservoirs.The results showed that the model achieves a mean square error of 0.003,a learned perceptual image patch similarity of 0.037,a structural similarity of 0.979,and an R^(2) of 0.982 between predictions and actual pressures,indicating excellent performance.The model also demonstrates strong robustness and accuracy in capturing spatialetemporal pressure features.展开更多
Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs...Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.展开更多
The integration of IoT and Deep Learning(DL)has significantly advanced real-time health monitoring and predictive maintenance in prognostic and health management(PHM).Electrocardiograms(ECGs)are widely used for cardio...The integration of IoT and Deep Learning(DL)has significantly advanced real-time health monitoring and predictive maintenance in prognostic and health management(PHM).Electrocardiograms(ECGs)are widely used for cardiovascular disease(CVD)diagnosis,but fluctuating signal patterns make classification challenging.Computer-assisted automated diagnostic tools that enhance ECG signal categorization using sophisticated algorithms and machine learning are helping healthcare practitioners manage greater patient populations.With this motivation,the study proposes a DL framework leveraging the PTB-XL ECG dataset to improve CVD diagnosis.Deep Transfer Learning(DTL)techniques extract features,followed by feature fusion to eliminate redundancy and retain the most informative features.Utilizing the African Vulture Optimization Algorithm(AVOA)for feature selection is more effective than the standard methods,as it offers an ideal balance between exploration and exploitation that results in an optimal set of features,improving classification performance while reducing redundancy.Various machine learning classifiers,including Support Vector Machine(SVM),eXtreme Gradient Boosting(XGBoost),Adaptive Boosting(AdaBoost),and Extreme Learning Machine(ELM),are used for further classification.Additionally,an ensemble model is developed to further improve accuracy.Experimental results demonstrate that the proposed model achieves the highest accuracy of 96.31%,highlighting its effectiveness in enhancing CVD diagnosis.展开更多
This study introduces a Transformer-based multimodal fusion framework for simulating multiphase flow and heat transfer in carbon dioxide(CO_(2))–water enhanced geothermal systems(EGS).The model integrates geological ...This study introduces a Transformer-based multimodal fusion framework for simulating multiphase flow and heat transfer in carbon dioxide(CO_(2))–water enhanced geothermal systems(EGS).The model integrates geological parameters,thermal gradients,and control schedules to enable fast and accurate prediction of complex reservoir dynamics.The main contributions are:(i)development of a workflow that couples physics-based reservoir simulation with a Transformer neural network architecture,(ii)design of physics-guided loss functions to enforce conservation of mass and energy,(iii)application of the surrogate model to closed-loop optimization using a differential evolution(DE)algorithm,and(iv)incorporation of economic performance metrics,such as net present value(NPV),into decision support.The proposed framework achieves root mean square error(RMSE)of 3–5%,mean absolute error(MAE)below 4%,and coefficients of determination greater than 0.95 across multiple prediction targets,including production rates,pressure distributions,and temperature fields.When compared with recurrent neural network(RNN)baselines such as gated recurrent units(GRU)and long short-term memory networks(LSTM),as well as a physics-informed reduced-order model,the Transformer-based approach demonstrates superior accuracy and computational efficiency.Optimization experiments further show a 15–20%improvement in NPV,highlighting the framework’s potential for real-time forecasting,optimization,and decision-making in geothermal reservoir engineering.展开更多
The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect...The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect detection.In contrast to the subjective and inefficient manual visual inspection,deep learning-based algorithms for concrete defect detection exhibit remarkable advantages,emerging as a focal point in recent research.This paper comprehensively analyzes the research progress of deep learning algorithms in the field of surface defect detection in concrete bridges in recent years.It introduces the early detection methods for surface defects in concrete bridges and the development of deep learning.Subsequently,it provides an overview of deep learning-based concrete bridge surface defect detection research from three aspects:image classification,object detection,and semantic segmentation.The paper summarizes the strengths and weaknesses of existing methods and the challenges they face.Additionally,it analyzes and prospects the development trends of surface defect detection in concrete bridges.展开更多
Medical image analysis has become a cornerstone of modern healthcare,driven by the exponential growth of data from imaging modalities such as MRI,CT,PET,ultrasound,and X-ray.Traditional machine learning methods have m...Medical image analysis has become a cornerstone of modern healthcare,driven by the exponential growth of data from imaging modalities such as MRI,CT,PET,ultrasound,and X-ray.Traditional machine learning methods have made early contributions;however,recent advancements in deep learning(DL)have revolutionized the field,offering state-of-the-art performance in image classification,segmentation,detection,fusion,registration,and enhancement.This comprehensive review presents an in-depth analysis of deep learning methodologies applied across medical image analysis tasks,highlighting both foundational models and recent innovations.The article begins by introducing conventional techniques and their limitations,setting the stage for DL-based solutions.Core DL architectures,including Convolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs),Generative Adversarial Networks(GANs),Vision Transformers(ViTs),and hybrid models,are discussed in detail,including their advantages and domain-specific adaptations.Advanced learning paradigms such as semi-supervised learning,selfsupervised learning,and few-shot learning are explored for their potential to mitigate data annotation challenges in clinical datasets.This review further categorizes major tasks in medical image analysis,elaborating on how DL techniques have enabled precise tumor segmentation,lesion detection,modality fusion,super-resolution,and robust classification across diverse clinical settings.Emphasis is placed on applications in oncology,cardiology,neurology,and infectious diseases,including COVID-19.Challenges such as data scarcity,label imbalance,model generalizability,interpretability,and integration into clinical workflows are critically examined.Ethical considerations,explainable AI(XAI),federated learning,and regulatory compliance are discussed as essential components of real-world deployment.Benchmark datasets,evaluation metrics,and comparative performance analyses are presented to support future research.The article concludes with a forward-looking perspective on the role of foundation models,multimodal learning,edge AI,and bio-inspired computing in the future of medical imaging.Overall,this review serves as a valuable resource for researchers,clinicians,and developers aiming to harness deep learning for intelligent,efficient,and clinically viable medical image analysis.展开更多
The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary ...The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.展开更多
Olive trees are susceptible to a variety of diseases that can cause significant crop damage and economic losses.Early detection of these diseases is essential for effective management.We propose a novel transformed wa...Olive trees are susceptible to a variety of diseases that can cause significant crop damage and economic losses.Early detection of these diseases is essential for effective management.We propose a novel transformed wavelet,feature-fused,pre-trained deep learning model for detecting olive leaf diseases.The proposed model combines wavelet transforms with pre-trained deep-learning models to extract discriminative features from olive leaf images.The model has four main phases:preprocessing using data augmentation,three-level wavelet transformation,learning using pre-trained deep learning models,and a fused deep learning model.In the preprocessing phase,the image dataset is augmented using techniques such as resizing,rescaling,flipping,rotation,zooming,and contrasting.In wavelet transformation,the augmented images are decomposed into three frequency levels.Three pre-trained deep learning models,EfficientNet-B7,DenseNet-201,and ResNet-152-V2,are used in the learning phase.The models were trained using the approximate images of the third-level sub-band of the wavelet transform.In the fused phase,the fused model consists of a merge layer,three dense layers,and two dropout layers.The proposed model was evaluated using a dataset of images of healthy and infected olive leaves.It achieved an accuracy of 99.72%in the diagnosis of olive leaf diseases,which exceeds the accuracy of other methods reported in the literature.This finding suggests that our proposed method is a promising tool for the early detection of olive leaf diseases.展开更多
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
文摘BACKGROUND Esophageal cancer is the sixth most common cancer worldwide,with a high mortality rate.Early prognosis of esophageal abnormalities can improve patient survival rates.The progression of esophageal cancer follows a sequence from esophagitis to non-dysplastic Barrett’s esophagus,dysplastic Barrett’s esophagus,and eventually esophageal adenocarcinoma(EAC).This study explored the application of deep learning technology in the precise diagnosis of pathological classification and staging of EAC to enhance diagnostic accuracy and efficiency.AIM To explore the application of deep learning models,particularly Wave-Vision Transformer(Wave-ViT),in the pathological classification and staging of esophageal cancer to enhance diagnostic accuracy and efficiency.METHODS We applied several deep learning models,including multi-layer perceptron,residual network,transformer,and Wave-ViT,to a dataset of clinically validated esophageal pathology images.The models were trained to identify pathological features and assist in the classification and staging of different stages of esophageal cancer.The models were compared based on accuracy,computational complexity,and efficiency.RESULTS The Wave-ViT model demonstrated the highest accuracy at 88.97%,surpassing the transformer(87.65%),residual network(85.44%),and multi-layer perceptron(81.17%).Additionally,Wave-ViT exhibited low computational complexity with significantly reduced parameter size,making it highly efficient for real-time clinical applications.CONCLUSION Deep learning technology,particularly the Frequency-Domain Transformer model,shows promise in improving the precision of pathological classification and staging of EAC.The application of the Frequency-Domain Transformer model enhances the automation of the diagnostic process and may support early detection and treatment of EAC.Future research may further explore the potential of this model in broader medical image analysis applications,particularly in the field of precision medicine.
基金The National Natural Science Foundation of China under contract Nos 42176011 and 61931025the Fundamental Research Funds for the Central Universities of China under contract No.24CX03001A.
文摘Efficient and accurate prediction of ocean surface latent heat fluxes is essential for understanding and modeling climate dynamics.Conventional estimation methods have low resolution and lack accuracy.The transformer model,with its self-attention mechanism,effectively captures long-range dependencies,leading to a degradation of accuracy over time.Due to the non-linearity and uncertainty of physical processes,the transformer model encounters the problem of error accumulation,leading to a degradation of accuracy over time.To solve this problem,we combine the Data Assimilation(DA)technique with the transformer model and continuously modify the model state to make it closer to the actual observations.In this paper,we propose a deep learning model called TransNetDA,which integrates transformer,convolutional neural network and DA methods.By combining data-driven and DA methods for spatiotemporal prediction,TransNetDA effectively extracts multi-scale spatial features and significantly improves prediction accuracy.The experimental results indicate that the TransNetDA method surpasses traditional techniques in terms of root mean square error and R2 metrics,showcasing its superior performance in predicting latent heat fluxes at the ocean surface.
基金This work was supported by the National Natural Science Foundation of China(61571388,61871465,62071414)the Project of Introducing Overseas Students in Hebei Province(C20200367).
文摘The issue of small-angle maneuvering targets inverse synthetic aperture radar(ISAR)imaging has been successfully addressed by popular motion compensation algorithms.However,when the target’s rotational velocity is sufficiently high during the dwell time of the radar,such compensation algorithms cannot obtain a high quality image.This paper proposes an ISAR imaging algorithm based on keystone transform and deep learning algorithm.The keystone transform is used to coarsely compensate for the target’s rotational motion and translational motion,and the deep learning algorithm is used to achieve a super-resolution image.The uniformly distributed point target data are used as the data set of the training u-net network.In addition,this method does not require estimating the motion parameters of the target,which simplifies the algorithm steps.Finally,several experiments are performed to demonstrate the effectiveness of the proposed algorithm.
文摘Many traditional denoising methods,such as Gaussian filtering,tend to blur and lose details or edge information while reducing noise.The stationary wavelet packet transform is a multi-scale and multi-band analysis tool.Compared with the stationary wavelet transform,it can suppress high-frequency noise while preserving more edge details.Deep learning has significantly progressed in denoising applications.DnCNN,a residual network;FFDNet,an efficient,fl exible network;U-NET,a codec network;and GAN,a generative adversative network,have better denoising effects than BM3D,the most popular conventional denoising method.Therefore,SWP_hFFDNet,a random noise attenuation network based on the stationary wavelet packet transform(SWPT)and modified FFDNet,is proposed.This network combines the advantages of SWPT,Huber norm,and FFDNet.In addition,it has three characteristics:First,SWPT is an eff ective featureextraction tool that can obtain low-and high-frequency features of different scales and frequency bands.Second,because the noise level map is the input of the network,the noise removal performance of diff erent noise levels can be improved.Third,the Huber norm can reduce the sensitivity of the network to abnormal data and enhance its robustness.The network is trained using the Adam algorithm and the BSD500 dataset,which is augmented,noised,and decomposed by SWPT.Experimental and actual data processing results show that the denoising eff ect of the proposed method is almost the same as those of BM3D,DnCNN,and FFDNet networks for low noise.However,for high noise,the proposed method is superior to the aforementioned networks.
基金the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2024-9/1).
文摘Arabic dialect identification is essential in Natural Language Processing(NLP)and forms a critical component of applications such as machine translation,sentiment analysis,and cross-language text generation.The difficulties in differentiating between Arabic dialects have garnered more attention in the last 10 years,particularly in social media.These difficulties result from the overlapping vocabulary of the dialects,the fluidity of online language use,and the difficulties in telling apart dialects that are closely related.Managing dialects with limited resources and adjusting to the ever-changing linguistic trends on social media platforms present additional challenges.A strong dialect recognition technique is essential to improving communication technology and cross-cultural understanding in light of the increase in social media usage.To distinguish Arabic dialects on social media,this research suggests a hybrid Deep Learning(DL)approach.The Long Short-Term Memory(LSTM)and Bidirectional Long Short-Term Memory(BiLSTM)architectures make up the model.A new textual dataset that focuses on three main dialects,i.e.,Levantine,Saudi,and Egyptian,is also available.Approximately 11,000 user-generated comments from Twitter are included in this dataset,which has been painstakingly annotated to guarantee accuracy in dialect classification.Transformers,DL models,and basic machine learning classifiers are used to conduct several tests to evaluate the performance of the suggested model.Various methodologies,including TF-IDF,word embedding,and self-attention mechanisms,are used.The suggested model fares better than other models in terms of accuracy,obtaining a remarkable 96.54%,according to the trial results.This study advances the discipline by presenting a new dataset and putting forth a practical model for Arabic dialect identification.This model may prove crucial for future work in sociolinguistic studies and NLP.
基金supported in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012485in part by Shenzhen Fundamental Research Program under Grant JCYJ20220810112354002+4 种基金in part by Shenzhen Science and Technology Program under Grant KJZD20230923114111021in part by the Fund for Academic Innovation Teams and Research Platform of South-Central Minzu University under Grant XTZ24003 and Grant PTZ24001in part by the Knowledge Innovation Program of Wuhan-Basic Research through Project 2023010201010151in part by the Research Start-up Funds of South-Central Minzu University under Grant YZZ18006in part by the Spring Sunshine Program of Ministry of Education of the People’s Republic of China under Grant HZKY20220331.
文摘Introduction Deep learning(DL),as one of the most transformative technologies in artificial intelligence(AI),is undergoing a pivotal transition from laboratory research to industrial deployment.Advancing at an unprecedented pace,DL is transcending theoretical and application boundaries to penetrate emerging realworld scenarios such as industrial automation,urban management,and health monitoring,thereby driving a new wave of intelligent transformation.In August 2023,Goldman Sachs estimated that global AI investment will reach US$200 billion by 2025[1].However,the increasing complexity and dynamic nature of application scenarios expose critical challenges in traditional deep learning,including data heterogeneity,insufficient model generalization,computational resource constraints,and privacy-security trade-offs.The next generation of deep learning methodologies needs to achieve breakthroughs in multimodal fusion,lightweight design,interpretability enhancement,and cross-disciplinary collaborative optimization,in order to develop more efficient,robust,and practically valuable intelligent systems.
文摘Addressing plant diseases and pests is not just crucial;it’s a matter of utmost importance for enhancing crop production and preventing economic losses. Recent advancements in artificial intelligence, machine learning, and deep learning have revolutionised the precision and efficiency of this process, surpassing the limitations of manual identification. This study comprehensively reviews modern computer-based techniques, including recent advances in artificial intelligence, for detecting diseases and pests through images. This paper uniquely categorises methodologies into hyperspectral imaging, non-visualisation techniques, visualisation approaches, modified deep learning architectures, and transformer models, helping researchers gain detailed, insightful understandings. The exhaustive survey of recent works and comparative studies in this domain guides researchers in selecting appropriate and advanced state-of-the-art methods for plant disease and pest detection. Additionally, this paper highlights the consistent superiority of modern AI-based approaches, which often outperform older image analysis methods in terms of speed and accuracy. Further, this survey focuses on the efficiency of vision transformers against well-known deep learning architectures like MobileNetV3, which shows that Hierarchical Vision Transformer (HvT) can achieve accuracy upwards of 99.3% in plant disease detection. The study concludes by addressing the challenges of designing the systems, proposing potential solutions, and outlining directions for future research in this field.
基金supported by the Postdoctoral Fellowship Program of CPSF(Grant No.GZB20230685)the National Science Foundation of China(Grant No.42277161).
文摘Forecasting landslide deformation is challenging due to influence of various internal and external factors on the occurrence of systemic and localized heterogeneities.Despite the potential to improve landslide predictability,deep learning has yet to be sufficiently explored for complex deformation patterns associated with landslides and is inherently opaque.Herein,we developed a holistic landslide deformation forecasting method that considers spatiotemporal correlations of landslide deformation by integrating domain knowledge into interpretable deep learning.By spatially capturing the interconnections between multiple deformations from different observation points,our method contributes to the understanding and forecasting of landslide systematic behavior.By integrating specific domain knowledge relevant to each observation point and merging internal properties with external variables,the local heterogeneity is considered in our method,identifying deformation temporal patterns in different landslide zones.Case studies involving reservoir-induced landslides and creeping landslides demonstrated that our approach(1)enhances the accuracy of landslide deformation forecasting,(2)identifies significant contributing factors and their influence on spatiotemporal deformation characteristics,and(3)demonstrates how identifying these factors and patterns facilitates landslide forecasting.Our research offers a promising and pragmatic pathway toward a deeper understanding and forecasting of complex landslide behaviors.
基金funded by Natural Science Foundation of Heilongjiang Province,grant number LH2023F020.
文摘Deep learning(DL)has revolutionized time series forecasting(TSF),surpassing traditional statistical methods(e.g.,ARIMA)and machine learning techniques in modeling complex nonlinear dynamics and long-term dependencies prevalent in real-world temporal data.This comprehensive survey reviews state-of-the-art DL architectures forTSF,focusing on four core paradigms:(1)ConvolutionalNeuralNetworks(CNNs),adept at extracting localized temporal features;(2)Recurrent Neural Networks(RNNs)and their advanced variants(LSTM,GRU),designed for sequential dependency modeling;(3)Graph Neural Networks(GNNs),specialized for forecasting structured relational data with spatial-temporal dependencies;and(4)Transformer-based models,leveraging self-attention mechanisms to capture global temporal patterns efficiently.We provide a rigorous analysis of the theoretical underpinnings,recent algorithmic advancements(e.g.,TCNs,attention mechanisms,hybrid architectures),and practical applications of each framework,supported by extensive benchmark datasets(e.g.,ETT,traffic flow,financial indicators)and standardized evaluation metrics(MAE,MSE,RMSE).Critical challenges,including handling irregular sampling intervals,integrating domain knowledge for robustness,and managing computational complexity,are thoroughly discussed.Emerging research directions highlighted include diffusion models for uncertainty quantification,hybrid pipelines combining classical statistical and DL techniques for enhanced interpretability,quantile regression with Transformers for riskaware forecasting,and optimizations for real-time deployment.This work serves as an essential reference,consolidating methodological innovations,empirical resources,and future trends to bridge the gap between theoretical research and practical implementation needs for researchers and practitioners in the field.
文摘BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making it crucial to develop deep learning methods to assist in image analysis.AIM To employ deep learning models for the automatic classification of small bowel lesions using pediatric VCE images.METHODS We retrospectively analyzed VCE images from 162 pediatric patients who underwent VCE between January 2021 and December 2023 at the Children's Hospital of Nanjing Medical University.A total of 2298 high-resolution images were extracted,including normal mucosa and lesions(erosions/erythema,ulcers,and polyps).The images were split into training and test datasets in a 4:1 ratio.Four deep learning models:DenseNet121,Visual geometry group-16,ResNet50,and vision transformer were trained using 5-fold cross-validation,with hyperparameters adjusted for optimal classification performance.The models were evaluated based on accuracy,precision,recall,F1-score,and area under the receiver operating curve(AU-ROC).Lesion visualization was performed using gradient-weighted class activation mapping.RESULTS Abdominal pain was the most common indication for VCE,accounting for 62%of cases,followed by diarrhea,vomiting,and gastrointestinal bleeding.Abnormal lesions were detected in 93 children,with 38 diagnosed with inflammatory bowel disease.Among the deep learning models,DenseNet121 and ResNet50 demonstrated excellent classification performance,achieving accuracies of 90.6%[95%confidence interval(CI):89.2-92.0]and 90.5%(95%CI:89.9-91.2),respectively.The AU-ROC values for these models were 93.7%(95%CI:92.9-94.5)for DenseNet121 and 93.4%(95%CI:93.1-93.8)for ResNet50.CONCLUSION Our deep learning-based diagnostic tool developed in this study effectively classified lesions in pediatric VCE images,contributing to more accurate diagnoses and increased diagnostic efficiency.
文摘Recent advances in artificial intelligence and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier.Therefore,developing reliable and robust deepfake video detection mechanisms is paramount.This research introduces a novel real-time deepfake video detection framework by analyzing gaze and blink patterns,addressing the spatial-temporal challenges unique to gaze and blink anomalies using the TimeSformer and hybrid Transformer-CNN models.The TimeSformer architecture leverages spatial-temporal attention mechanisms to capture fine-grained blinking intervals and gaze direction anomalies.Compared to state-of-the-art traditional convolutional models like MesoNet and EfficientNet,which primarily focus on global facial features,our approach emphasizes localized eye-region analysis,significantly enhancing detection accuracy.We evaluate our framework on four standard datasets:FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb.The proposed framework results reveal higher accuracy,with the TimeSformer model achieving accuracies of 97.5%,96.3%,95.8%,and 97.1%,and with the hybrid Transformer-CNN model demonstrating accuracies of 92.8%,91.5%,90.9%,and 93.2%,on FaceForensics,CelebDF-V2,DFDC,and FakeAVCeleb datasets,respectively,showing robustness in distinguishing manipulated from authentic videos.Our research provides a robust state-of-the-art framework for real-time deepfake video detection.This novel study significantly contributes to video forensics,presenting scalable and accurate real-world application solutions.
基金the National Natural Science Foundation of China(No.52474068)the Major Collab-orative Innovation Project of Prospecting Breakthrough Stra-tegic Action in Guizhou Province(No.[2022]ZD001-003).
文摘Coalbed methane(CBM)is a vital unconventional energy resource,and predicting its spatiotemporal pressure dynamics is crucial for efficient development strategies.This paper proposes a novel deep learningebased data-driven surrogate model,AxialViT-ConvLSTM,which integrates AxialAttention Vision Transformer,ConvLSTM,and an enhanced loss function to predict pressure dynamics in CBM reservoirs.The results showed that the model achieves a mean square error of 0.003,a learned perceptual image patch similarity of 0.037,a structural similarity of 0.979,and an R^(2) of 0.982 between predictions and actual pressures,indicating excellent performance.The model also demonstrates strong robustness and accuracy in capturing spatialetemporal pressure features.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.72101046 and 61672128)。
文摘Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.
基金funded by Researchers Supporting ProjectNumber(RSPD2025R947),King Saud University,Riyadh,Saudi Arabia.
文摘The integration of IoT and Deep Learning(DL)has significantly advanced real-time health monitoring and predictive maintenance in prognostic and health management(PHM).Electrocardiograms(ECGs)are widely used for cardiovascular disease(CVD)diagnosis,but fluctuating signal patterns make classification challenging.Computer-assisted automated diagnostic tools that enhance ECG signal categorization using sophisticated algorithms and machine learning are helping healthcare practitioners manage greater patient populations.With this motivation,the study proposes a DL framework leveraging the PTB-XL ECG dataset to improve CVD diagnosis.Deep Transfer Learning(DTL)techniques extract features,followed by feature fusion to eliminate redundancy and retain the most informative features.Utilizing the African Vulture Optimization Algorithm(AVOA)for feature selection is more effective than the standard methods,as it offers an ideal balance between exploration and exploitation that results in an optimal set of features,improving classification performance while reducing redundancy.Various machine learning classifiers,including Support Vector Machine(SVM),eXtreme Gradient Boosting(XGBoost),Adaptive Boosting(AdaBoost),and Extreme Learning Machine(ELM),are used for further classification.Additionally,an ensemble model is developed to further improve accuracy.Experimental results demonstrate that the proposed model achieves the highest accuracy of 96.31%,highlighting its effectiveness in enhancing CVD diagnosis.
文摘This study introduces a Transformer-based multimodal fusion framework for simulating multiphase flow and heat transfer in carbon dioxide(CO_(2))–water enhanced geothermal systems(EGS).The model integrates geological parameters,thermal gradients,and control schedules to enable fast and accurate prediction of complex reservoir dynamics.The main contributions are:(i)development of a workflow that couples physics-based reservoir simulation with a Transformer neural network architecture,(ii)design of physics-guided loss functions to enforce conservation of mass and energy,(iii)application of the surrogate model to closed-loop optimization using a differential evolution(DE)algorithm,and(iv)incorporation of economic performance metrics,such as net present value(NPV),into decision support.The proposed framework achieves root mean square error(RMSE)of 3–5%,mean absolute error(MAE)below 4%,and coefficients of determination greater than 0.95 across multiple prediction targets,including production rates,pressure distributions,and temperature fields.When compared with recurrent neural network(RNN)baselines such as gated recurrent units(GRU)and long short-term memory networks(LSTM),as well as a physics-informed reduced-order model,the Transformer-based approach demonstrates superior accuracy and computational efficiency.Optimization experiments further show a 15–20%improvement in NPV,highlighting the framework’s potential for real-time forecasting,optimization,and decision-making in geothermal reservoir engineering.
基金supported by the Key Research and Development Program of Shaanxi Province-International Science and Technology Cooperation Program Project (No.2020KW-001)the Contract for Xi'an Municipal Science and Technology Plan Project-Xi'an City Strong Foundation Innovation Plan (No.21XJZZ0074)the Key Project of Graduate Student Innovation Fund at Xi'an University of Posts and Telecommunications (No.CXJJZL2023013)。
文摘The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect detection.In contrast to the subjective and inefficient manual visual inspection,deep learning-based algorithms for concrete defect detection exhibit remarkable advantages,emerging as a focal point in recent research.This paper comprehensively analyzes the research progress of deep learning algorithms in the field of surface defect detection in concrete bridges in recent years.It introduces the early detection methods for surface defects in concrete bridges and the development of deep learning.Subsequently,it provides an overview of deep learning-based concrete bridge surface defect detection research from three aspects:image classification,object detection,and semantic segmentation.The paper summarizes the strengths and weaknesses of existing methods and the challenges they face.Additionally,it analyzes and prospects the development trends of surface defect detection in concrete bridges.
文摘Medical image analysis has become a cornerstone of modern healthcare,driven by the exponential growth of data from imaging modalities such as MRI,CT,PET,ultrasound,and X-ray.Traditional machine learning methods have made early contributions;however,recent advancements in deep learning(DL)have revolutionized the field,offering state-of-the-art performance in image classification,segmentation,detection,fusion,registration,and enhancement.This comprehensive review presents an in-depth analysis of deep learning methodologies applied across medical image analysis tasks,highlighting both foundational models and recent innovations.The article begins by introducing conventional techniques and their limitations,setting the stage for DL-based solutions.Core DL architectures,including Convolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs),Generative Adversarial Networks(GANs),Vision Transformers(ViTs),and hybrid models,are discussed in detail,including their advantages and domain-specific adaptations.Advanced learning paradigms such as semi-supervised learning,selfsupervised learning,and few-shot learning are explored for their potential to mitigate data annotation challenges in clinical datasets.This review further categorizes major tasks in medical image analysis,elaborating on how DL techniques have enabled precise tumor segmentation,lesion detection,modality fusion,super-resolution,and robust classification across diverse clinical settings.Emphasis is placed on applications in oncology,cardiology,neurology,and infectious diseases,including COVID-19.Challenges such as data scarcity,label imbalance,model generalizability,interpretability,and integration into clinical workflows are critically examined.Ethical considerations,explainable AI(XAI),federated learning,and regulatory compliance are discussed as essential components of real-world deployment.Benchmark datasets,evaluation metrics,and comparative performance analyses are presented to support future research.The article concludes with a forward-looking perspective on the role of foundation models,multimodal learning,edge AI,and bio-inspired computing in the future of medical imaging.Overall,this review serves as a valuable resource for researchers,clinicians,and developers aiming to harness deep learning for intelligent,efficient,and clinically viable medical image analysis.
基金supported by the National Natural Science Foundation of China(61906050,21365008)Guangxi Technology R&D Program(2018AD11018)Innovation Project of GUET Graduate Education(2021YCXS050).
文摘The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.
文摘Olive trees are susceptible to a variety of diseases that can cause significant crop damage and economic losses.Early detection of these diseases is essential for effective management.We propose a novel transformed wavelet,feature-fused,pre-trained deep learning model for detecting olive leaf diseases.The proposed model combines wavelet transforms with pre-trained deep-learning models to extract discriminative features from olive leaf images.The model has four main phases:preprocessing using data augmentation,three-level wavelet transformation,learning using pre-trained deep learning models,and a fused deep learning model.In the preprocessing phase,the image dataset is augmented using techniques such as resizing,rescaling,flipping,rotation,zooming,and contrasting.In wavelet transformation,the augmented images are decomposed into three frequency levels.Three pre-trained deep learning models,EfficientNet-B7,DenseNet-201,and ResNet-152-V2,are used in the learning phase.The models were trained using the approximate images of the third-level sub-band of the wavelet transform.In the fused phase,the fused model consists of a merge layer,three dense layers,and two dropout layers.The proposed model was evaluated using a dataset of images of healthy and infected olive leaves.It achieved an accuracy of 99.72%in the diagnosis of olive leaf diseases,which exceeds the accuracy of other methods reported in the literature.This finding suggests that our proposed method is a promising tool for the early detection of olive leaf diseases.