Abstract:Stephen Crane was an outstanding American novelist,poet,and journalist.He achieved great success in his literary works during his brief career.Crane’s most well-known work,The Red Badge of Courage,is commonl...Abstract:Stephen Crane was an outstanding American novelist,poet,and journalist.He achieved great success in his literary works during his brief career.Crane’s most well-known work,The Red Badge of Courage,is commonly believed to be the first great novel of the American Civil War,largely because of its vivid and detailed description of the experience of warfare.This paper analyzes the images of color,animal and machine,which convey Crane’s thoughts of war:war is full of chaos,brutality,and confusion,without any romantic elements or heroism.展开更多
The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photograp...The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.展开更多
Objective:Early predicting response before neoadjuvant chemotherapy(NAC)is crucial for personalized treatment plans for locally advanced breast cancer patients.We aim to develop a multi-task model using multiscale who...Objective:Early predicting response before neoadjuvant chemotherapy(NAC)is crucial for personalized treatment plans for locally advanced breast cancer patients.We aim to develop a multi-task model using multiscale whole slide images(WSIs)features to predict the response to breast cancer NAC more finely.Methods:This work collected 1,670 whole slide images for training and validation sets,internal testing sets,external testing sets,and prospective testing sets of the weakly-supervised deep learning-based multi-task model(DLMM)in predicting treatment response and pCR to NAC.Our approach models two-by-two feature interactions across scales by employing concatenate fusion of single-scale feature representations,and controls the expressiveness of each representation via a gating-based attention mechanism.Results:In the retrospective analysis,DLMM exhibited excellent predictive performance for the prediction of treatment response,with area under the receiver operating characteristic curves(AUCs)of 0.869[95%confidence interval(95%CI):0.806−0.933]in the internal testing set and 0.841(95%CI:0.814−0.867)in the external testing sets.For the pCR prediction task,DLMM reached AUCs of 0.865(95%CI:0.763−0.964)in the internal testing and 0.821(95%CI:0.763−0.878)in the pooled external testing set.In the prospective testing study,DLMM also demonstrated favorable predictive performance,with AUCs of 0.829(95%CI:0.754−0.903)and 0.821(95%CI:0.692−0.949)in treatment response and pCR prediction,respectively.DLMM significantly outperformed the baseline models in all testing sets(P<0.05).Heatmaps were employed to interpret the decision-making basis of the model.Furthermore,it was discovered that high DLMM scores were associated with immune-related pathways and cells in the microenvironment during biological basis exploration.Conclusions:The DLMM represents a valuable tool that aids clinicians in selecting personalized treatment strategies for breast cancer patients.展开更多
Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-a...Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-aided diagnosis system is highly required.Ultrasound is an important examination technique for breast cancer diagnosis due to its low cost.Recently,many learning-based techniques have been introduced to classify breast cancer using breast ultrasound imaging dataset(BUSI)datasets;however,the manual handling is not an easy process and time consuming.The authors propose an EfficientNet-integrated ResNet deep network and XAI-based framework for accurately classifying breast cancer(malignant and benign).In the initial step,data augmentation is performed to increase the number of training samples.For this purpose,three-pixel flip mathematical equations are introduced:horizontal,vertical,and 90°.Later,two pretrained deep learning models were employed,skipped some layers,and fine-tuned.Both fine-tuned models are later trained using a deep transfer learning process and extracted features from the deeper layer.Explainable artificial intelligence-based analysed the performance of trained models.After that,a new feature selection technique is proposed based on the cuckoo search algorithm called cuckoo search controlled standard error mean.This technique selects the best features and fuses using a new parallel zeropadding maximum correlated coefficient features.In the end,the selection algorithm is applied again to the fused feature vector and classified using machine learning algorithms.The experimental process of the proposed framework is conducted on a publicly available BUSI and obtained 98.4%and 98%accuracy in two different experiments.Comparing the proposed framework is also conducted with recent techniques and shows improved accuracy.In addition,the proposed framework was executed less than the original deep learning models.展开更多
Several socio-environmental needs(medicine,industry,engineering,orogenesis,genesis,etc.)require minerals to be more precisly defined and characterised.The identification of minerals plays a crucial role for researcher...Several socio-environmental needs(medicine,industry,engineering,orogenesis,genesis,etc.)require minerals to be more precisly defined and characterised.The identification of minerals plays a crucial role for researchers and is becoming an essential aspect of geological analysis.However,traditional methods relied heavily on expert knowledge and specialised equipment,making them labour-intensive,costly and time-consuming.This depen-dence is often labour-intensive,not to mention costly and time-consuming.To address this issue,some re-searchers have opted for machine learning algorithms to quickly identify a single mineral in a microscopic image of rocks.However this approch does not correspond to patterns of mineral distribution,where minerals are typically found in associations.These associations make it difficult to accurately identify minerals using con-ventional machine learning algorithms.This paper introduces a deep neural learning model based on multi-label classification,utilizing the problem adaptation method to analyse microscopic images of rock thin sections.The model is based on the ResNet50 architecture,which is designed to analyse minerals and generates the probability of a mineral presence in an image.This method provides a solution to the dependence between associated minerals.Experiments on many test images showed a model confidence,achieving average precision,recall and F1_score 97.15%,96.25%and 96.69%,respectively.Visualisation of the class activation mapping using the Grad-CAM algorithm indicates that our model is likely to locate the identified minerals effectively.In this way,the importance of each pixel with the class of interest can be assessed using heat maps.The recorded results,in terms of both performance and pixel_level evaluation,demonstrate the promising potential of the model used.It can therefore be considered for multi-labels image classification,particulary for images representing rock minerals.This approach serves as a valuable support tool for geological studies.展开更多
Kinship verification is a key biometric recognition task that determines biological relationships based on physical features.Traditional methods predominantly use facial recognition,leveraging established techniques a...Kinship verification is a key biometric recognition task that determines biological relationships based on physical features.Traditional methods predominantly use facial recognition,leveraging established techniques and extensive datasets.However,recent research has highlighted ear recognition as a promising alternative,offering advantages in robustness against variations in facial expressions,aging,and occlusions.Despite its potential,a significant challenge in ear-based kinship verification is the lack of large-scale datasets necessary for training deep learning models effectively.To address this challenge,we introduce the EarKinshipVN dataset,a novel and extensive collection of ear images designed specifically for kinship verification.This dataset consists of 4876 high-resolution color images from 157 multiracial families across different regions,forming 73,220 kinship pairs.EarKinshipVN,a diverse and large-scale dataset,advances kinship verification research using ear features.Furthermore,we propose the Mixer Attention Inception(MAI)model,an improved architecture that enhances feature extraction and classification accuracy.The MAI model fuses Inceptionv4 and MLP Mixer,integrating four attention mechanisms to enhance spatial and channel-wise feature representation.Experimental results demonstrate that MAI significantly outperforms traditional backbone architectures.It achieves an accuracy of 98.71%,surpassing Vision Transformer models while reducing computational complexity by up to 95%in parameter usage.These findings suggest that ear-based kinship verification,combined with an optimized deep learning model and a comprehensive dataset,holds significant promise for biometric applications.展开更多
Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting pl...Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting play an important role in cancer identification and its grading.In this study,WaveSeg-UNet,a lightweight model,is introduced to segment cancerous nuclei having touching boundaries.Residual blocks are used for feature extraction.Only one feature extractor block is used in each level of the encoder and decoder.Normally,images degrade quality and lose important information during down-sampling.To overcome this loss,discrete wavelet transform(DWT)alongside maxpooling is used in the down-sampling process.Inverse DWT is used to regenerate original images during up-sampling.In the bottleneck of the proposed model,atrous spatial channel pyramid pooling(ASCPP)is used to extract effective high-level features.The ASCPP is the modified pyramid pooling having atrous layers to increase the area of the receptive field.Spatial and channel-based attention are used to focus on the location and class of the identified objects.Finally,watershed transform is used as a post processing technique to identify and refine touching boundaries of nuclei.Nuclei are identified and counted to facilitate pathologists.The same domain of transfer learning is used to retrain the model for domain adaptability.Results of the proposed model are compared with state-of-the-art models,and it outperformed the existing studies.展开更多
In the evolving landscape of secure communication,steganography has become increasingly vital to secure the transmission of secret data through an insecure public network.Several steganographic algorithms have been pr...In the evolving landscape of secure communication,steganography has become increasingly vital to secure the transmission of secret data through an insecure public network.Several steganographic algorithms have been proposed using digital images with a common objective of balancing a trade-off between the payload size and the quality of the stego image.In the existing steganographic works,a remarkable distortion of the stego image persists when the payload size is increased,making several existing works impractical to the current world of vast data.This paper introduces FuzzyStego,a novel approach designed to enhance the stego image’s quality by minimizing the effect of the payload size on the stego image’s quality.In line with the limitations of traditional methods like Pixel Value Differencing(PVD),Transform Domain Techniques,and Least Significant Bit(LSB)insertion,such as image quality degradation,vulnerability to processing attacks,and restricted capacity,FuzzyStego utilizes fuzzy logic to categorize pixels into intensity levels:Low(L),Medium-Low(ML),Medium(M),Medium-High(MH),and High(H).This classification enables adaptive data embedding,minimizing detectability by adjusting the hidden bit count according to the intensity levels.Experimental results show that FuzzyStego achieves an average Peak Signal-to-Noise Ratio(PSNR)of 58.638 decibels(dB)and a Structural Similarity Index Measure(SSIM)of almost 1.00,demonstrating its promising capability to preserve image quality while embedding data effectively.展开更多
Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-...Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.展开更多
For the history of medical culture in the world,the exchange and transmission of medical knowledge has formed an important part of mutual learning among different cultures,which has also increasingly shown unique acad...For the history of medical culture in the world,the exchange and transmission of medical knowledge has formed an important part of mutual learning among different cultures,which has also increasingly shown unique academic value in the study of knowledge history.Traditional Eastern medicine(such as Chinese medicine,Indian ayurvedic medicine,Persian medicine,Arabic medicine),and other medical systems in the ancient Western world(including Greek medicine and Roman medicine)have left precious literature/texts,cultural relics(for example,pills,preparations,medical instruments),folklore and legends,which truly record the process of learning,transplantation,fusion and succession after the encounter of different medical systems at least for the past two thousand years.展开更多
Edo-period historical records and documents preserved a substantial number of images,many of which are related to epidemic outbreaks.Through systematic collation and categorical analysis,this study uses the chronologi...Edo-period historical records and documents preserved a substantial number of images,many of which are related to epidemic outbreaks.Through systematic collation and categorical analysis,this study uses the chronological and thematic characteristics of these images as a framework to examine the response mechanisms of the Japanese government and public during infectious disease pandemics in the Edo period,as well as the multidimensional impacts of epidemics on social economy,culture,and customs.Illustrations of smallpox in medical texts reveal the developmental trajectory of Japan’s traditional medical knowledge system,while drawings in essays and diaries reflect public fear and non-medical cognitive patterns during cholera outbreaks.Epidemic-themed paintings not only document cholera treatment protocols by the government and medical professionals,as well as grassroots prevention and treatment practices for measles,but also vividly depict social dynamics during crises.Images related to epidemics in advertising reflect the prosperity of the pharmaceutical industry in the Edo period,while depictions in folding screens,ukiyozoushi and the occupational illustrations demonstrate societal customs for epidemic response.Collectively,the Edo-period epidemic crises profoundly shaped Japan’s medical system,economic structure,cultural forms,folk traditions,and public psychology,prompting the government,medical professionals,and civilians to develop distinct era-specific social coping mechanisms.展开更多
A novel CNN-Mamba hybrid architecture was proposed to address intra-class variance and inter-class similarity in remote sensing imagery.The framework integrates:(1)parallel CNN and visual state space(VSS)encoders,(2)m...A novel CNN-Mamba hybrid architecture was proposed to address intra-class variance and inter-class similarity in remote sensing imagery.The framework integrates:(1)parallel CNN and visual state space(VSS)encoders,(2)multi-scale cross-attention feature fusion,and(3)a boundary-constrained decoder.This design overcomes CNN s limited receptive fields and ViT s quadratic complexity while efficiently capturing both local features and global dependencies.Evaluations on LoveDA and ISPRS Vaihingen datasets demonstrate superior segmentation accuracy and boundary preservation compared to existing approaches,with the dual-branch structure maintaining computational efficiency throughout the process.展开更多
Autism spectrum disorder(ASD)is a multifaceted neurological developmental condition that manifests in several ways.Nearly all autistic children remain undiagnosed before the age of three.Developmental problems affecti...Autism spectrum disorder(ASD)is a multifaceted neurological developmental condition that manifests in several ways.Nearly all autistic children remain undiagnosed before the age of three.Developmental problems affecting face features are often associated with fundamental brain disorders.The facial evolution of newborns with ASD is quite different from that of typically developing children.Early recognition is very significant to aid families and parents in superstition and denial.Distinguishing facial features from typically developing children is an evident manner to detect children analyzed with ASD.Presently,artificial intelligence(AI)significantly contributes to the emerging computer-aided diagnosis(CAD)of autism and to the evolving interactivemethods that aid in the treatment and reintegration of autistic patients.This study introduces an Ensemble of deep learning models based on the autism spectrum disorder detection in facial images(EDLM-ASDDFI)model.The overarching goal of the EDLM-ASDDFI model is to recognize the difference between facial images of individuals with ASD and normal controls.In the EDLM-ASDDFI method,the primary level of data pre-processing is involved by Gabor filtering(GF).Besides,the EDLM-ASDDFI technique applies the MobileNetV2 model to learn complex features from the pre-processed data.For the ASD detection process,the EDLM-ASDDFI method uses ensemble techniques for classification procedure that encompasses long short-term memory(LSTM),deep belief network(DBN),and hybrid kernel extreme learning machine(HKELM).Finally,the hyperparameter selection of the three deep learning(DL)models can be implemented by the design of the crested porcupine optimizer(CPO)technique.An extensive experiment was conducted to emphasize the improved ASD detection performance of the EDLM-ASDDFI method.The simulation outcomes indicated that the EDLM-ASDDFI technique highlighted betterment over other existing models in terms of numerous performance measures.展开更多
Global climate change,along with the rapid increase of the population,has put significant pressure on water security.A water reservoir is an effective solution for adjusting and ensuring water supply.In particular,the...Global climate change,along with the rapid increase of the population,has put significant pressure on water security.A water reservoir is an effective solution for adjusting and ensuring water supply.In particular,the reservoir water level is an essential physical indicator for the reservoirs.Forecasting the reservoir water level effectively assists the managers in making decisions and plans related to reservoir management policies.In recent years,deep learning models have been widely applied to solve forecasting problems.In this study,we propose a novel hybrid deep learning model namely the YOLOv9_ConvLSTM that integrates YOLOv9,ConvLSTM,and linear interpolation to predict reservoir water levels.It utilizes data from Sentinel-2 satellite images,generated from visible spectrum bands(Red-Blue-Green)to reconstruct true-color reservoir images.Adam is used as the optimization algorithm with the loss function being MSE(Mean Squared Error)to evaluate the model’s error during training.We implemented and validated the proposed model using Sentinel-2 satellite imagery for the An Khe reservoir in Vietnam.To assess its performance,we also conducted comparative experiments with other related models,including SegNet_ConvLSTM and UNet_ConvLSTM,on the same dataset.The model performances were validated using k-fold cross-validation and ANOVA analysis.The experimental results demonstrate that the YOLOv9_ConvLSTM model outperforms the compared models.It has been seen that the proposed approach serves as a valuable tool for reservoir water level forecasting using satellite imagery that contributes to effective water resource management.展开更多
The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these...The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.展开更多
AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along w...AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along with patient medical records,were retrospectively collected from 3244 cases.Diagnostic models were constructed using logistic regression(LR),decision tree(DT),support vector machine(SVM),extreme gradient boosting(XGBoost),and deep learning(DL)algorithms.A total of 2757 diplopia images were randomly selected as training data,while the test dataset contained 487 diplopia images.The optimal diagnostic model was evaluated using test set accuracy,confusion matrix,and precision-recall curve(P-R curve).RESULTS:The test set accuracy of the LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.762,0.811,0.818,0.812,0.858 and 0.858,respectively.The accuracy in the training set was 0.785,0.815,0.998,0.965,0.968,and 0.967,respectively.The weighted precision of LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.74,0.77,0.83,0.80,0.85,and 0.85,respectively;weighted recall was 0.76,0.81,0.82,0.81,0.86,and 0.86,respectively;weighted F1 score was 0.74,0.79,0.82,0.80,0.85,and 0.85,respectively.CONCLUSION:In this study,the 7 machine learning algorithms all achieve automatic diagnosis of extraocular muscle palsy.The DL(64 categories)and DL(6 binary classifications)algorithms have a significant advantage over other machine learning algorithms regarding diagnostic accuracy on the test set,with a high level of consistency with clinical diagnoses made by physicians.Therefore,it can be used as a reference for diagnosis.展开更多
This study introduces a novel method for reconstructing the 3D model of aluminum foam using cross-sectional sequence images.Combining precision milling and image acquisition,high-qual-ity cross-sectional images are ob...This study introduces a novel method for reconstructing the 3D model of aluminum foam using cross-sectional sequence images.Combining precision milling and image acquisition,high-qual-ity cross-sectional images are obtained.Pore structures are segmented by the U-shaped network(U-Net)neural network integrated with the Canny edge detection operator,ensuring accurate pore delineation and edge extraction.The trained U-Net achieves 98.55%accuracy.The 2D data are superimposed and processed into 3D point clouds,enabling reconstruction of the pore structure and aluminum skeleton.Analysis of pore 01 shows the cross-sectional area initially increases,and then decreases with milling depth,with a uniform point distribution of 40 per layer.The reconstructed model exhibits a porosity of 77.5%,with section overlap rates between the 2D pore segmentation and the reconstructed model exceeding 96%,confirming high fidelity.Equivalent sphere diameters decrease with size,averaging 1.95 mm.Compression simulations reveal that the stress-strain curve of the 3D reconstruction model of aluminum foam exhibits fluctuations,and the stresses in the reconstruction model concentrate on thin cell walls,leading to localized deformations.This method accurately restores the aluminum foam’s complex internal structure,improving reconstruction preci-sion and simulation reliability.The approach offers a cost-efficient,high-precision technique for optimizing material performance in engineering applications.展开更多
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen...Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.展开更多
When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)frame...When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)framework,rendering them challenging to deal with tasks that demand high precision.To address these problems,this paper proposes a high-precision object detection algorithm based on YOLOv10s.Firstly,a Multi-branch Enhancement Coordinate Attention(MECA)module is proposed to enhance feature extraction capability.Secondly,a Multilayer Feature Reconstruction(MFR)mechanism is designed to fully exploit multilayer features,which can enrich object information as well as remove redundant information.Finally,an MFR Path Aggregation Network(MFR-Neck)is constructed,which integrates multi-scale features to improve the network's ability to perceive objects of var-ying sizes.The experimental results demonstrate that the proposed algorithm increases the average detection accuracy by 14.15%on the Vis Drone dataset compared to YOLOv10s,effectively enhancing object detection precision in UAV-taken images.展开更多
This paper proposes a novel method for the automatic diagnosis of keratitis using feature vector quantization and self-attention mechanisms(ADK_FVQSAM).First,high-level features are extracted using the DenseNet121 bac...This paper proposes a novel method for the automatic diagnosis of keratitis using feature vector quantization and self-attention mechanisms(ADK_FVQSAM).First,high-level features are extracted using the DenseNet121 backbone network,followed by adaptive average pooling to scale the features to a fixed length.Subsequently,product quantization with residuals(PQR)is applied to convert continuous feature vectors into discrete features representations,preserving essential information insensitive to image quality variations.The quantized and original features are concatenated and fed into a self-attention mechanism to capture keratitis-related features.Finally,these enhanced features are classified through a fully connected layer.Experiments on clinical low-quality(LQ)images show that ADK_FVQSAM achieves accuracies of 87.7%,81.9%,and 89.3% for keratitis,other corneal abnormalities,and normal corneas,respectively.Compared to DenseNet121,Swin transformer,and InceptionResNet,ADK_FVQSAM improves average accuracy by 3.1%,11.3%,and 15.3%,respectively.These results demonstrate that ADK_FVQSAM significantly enhances the recognition performance of keratitis based on LQ slit-lamp images,offering a practical approach for clinical application.展开更多
文摘Abstract:Stephen Crane was an outstanding American novelist,poet,and journalist.He achieved great success in his literary works during his brief career.Crane’s most well-known work,The Red Badge of Courage,is commonly believed to be the first great novel of the American Civil War,largely because of its vivid and detailed description of the experience of warfare.This paper analyzes the images of color,animal and machine,which convey Crane’s thoughts of war:war is full of chaos,brutality,and confusion,without any romantic elements or heroism.
文摘The application of deep learning for target detection in aerial images captured by Unmanned Aerial Vehicles(UAV)has emerged as a prominent research focus.Due to the considerable distance between UAVs and the photographed objects,coupled with complex shooting environments,existing models often struggle to achieve accurate real-time target detection.In this paper,a You Only Look Once v8(YOLOv8)model is modified from four aspects:the detection head,the up-sampling module,the feature extraction module,and the parameter optimization of positive sample screening,and the YOLO-S3DT model is proposed to improve the performance of the model for detecting small targets in aerial images.Experimental results show that all detection indexes of the proposed model are significantly improved without increasing the number of model parameters and with the limited growth of computation.Moreover,this model also has the best performance compared to other detecting models,demonstrating its advancement within this category of tasks.
基金supported by the National Natural Science Foundation of China(No.82371933)the National Natural Science Foundation of Shandong Province of China(No.ZR2021MH120)+1 种基金the Taishan Scholars Project(No.tsqn202211378)the Shandong Provincial Natural Science Foundation for Excellent Young Scholars(No.ZR2024YQ075).
文摘Objective:Early predicting response before neoadjuvant chemotherapy(NAC)is crucial for personalized treatment plans for locally advanced breast cancer patients.We aim to develop a multi-task model using multiscale whole slide images(WSIs)features to predict the response to breast cancer NAC more finely.Methods:This work collected 1,670 whole slide images for training and validation sets,internal testing sets,external testing sets,and prospective testing sets of the weakly-supervised deep learning-based multi-task model(DLMM)in predicting treatment response and pCR to NAC.Our approach models two-by-two feature interactions across scales by employing concatenate fusion of single-scale feature representations,and controls the expressiveness of each representation via a gating-based attention mechanism.Results:In the retrospective analysis,DLMM exhibited excellent predictive performance for the prediction of treatment response,with area under the receiver operating characteristic curves(AUCs)of 0.869[95%confidence interval(95%CI):0.806−0.933]in the internal testing set and 0.841(95%CI:0.814−0.867)in the external testing sets.For the pCR prediction task,DLMM reached AUCs of 0.865(95%CI:0.763−0.964)in the internal testing and 0.821(95%CI:0.763−0.878)in the pooled external testing set.In the prospective testing study,DLMM also demonstrated favorable predictive performance,with AUCs of 0.829(95%CI:0.754−0.903)and 0.821(95%CI:0.692−0.949)in treatment response and pCR prediction,respectively.DLMM significantly outperformed the baseline models in all testing sets(P<0.05).Heatmaps were employed to interpret the decision-making basis of the model.Furthermore,it was discovered that high DLMM scores were associated with immune-related pathways and cells in the microenvironment during biological basis exploration.Conclusions:The DLMM represents a valuable tool that aids clinicians in selecting personalized treatment strategies for breast cancer patients.
文摘Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-aided diagnosis system is highly required.Ultrasound is an important examination technique for breast cancer diagnosis due to its low cost.Recently,many learning-based techniques have been introduced to classify breast cancer using breast ultrasound imaging dataset(BUSI)datasets;however,the manual handling is not an easy process and time consuming.The authors propose an EfficientNet-integrated ResNet deep network and XAI-based framework for accurately classifying breast cancer(malignant and benign).In the initial step,data augmentation is performed to increase the number of training samples.For this purpose,three-pixel flip mathematical equations are introduced:horizontal,vertical,and 90°.Later,two pretrained deep learning models were employed,skipped some layers,and fine-tuned.Both fine-tuned models are later trained using a deep transfer learning process and extracted features from the deeper layer.Explainable artificial intelligence-based analysed the performance of trained models.After that,a new feature selection technique is proposed based on the cuckoo search algorithm called cuckoo search controlled standard error mean.This technique selects the best features and fuses using a new parallel zeropadding maximum correlated coefficient features.In the end,the selection algorithm is applied again to the fused feature vector and classified using machine learning algorithms.The experimental process of the proposed framework is conducted on a publicly available BUSI and obtained 98.4%and 98%accuracy in two different experiments.Comparing the proposed framework is also conducted with recent techniques and shows improved accuracy.In addition,the proposed framework was executed less than the original deep learning models.
文摘Several socio-environmental needs(medicine,industry,engineering,orogenesis,genesis,etc.)require minerals to be more precisly defined and characterised.The identification of minerals plays a crucial role for researchers and is becoming an essential aspect of geological analysis.However,traditional methods relied heavily on expert knowledge and specialised equipment,making them labour-intensive,costly and time-consuming.This depen-dence is often labour-intensive,not to mention costly and time-consuming.To address this issue,some re-searchers have opted for machine learning algorithms to quickly identify a single mineral in a microscopic image of rocks.However this approch does not correspond to patterns of mineral distribution,where minerals are typically found in associations.These associations make it difficult to accurately identify minerals using con-ventional machine learning algorithms.This paper introduces a deep neural learning model based on multi-label classification,utilizing the problem adaptation method to analyse microscopic images of rock thin sections.The model is based on the ResNet50 architecture,which is designed to analyse minerals and generates the probability of a mineral presence in an image.This method provides a solution to the dependence between associated minerals.Experiments on many test images showed a model confidence,achieving average precision,recall and F1_score 97.15%,96.25%and 96.69%,respectively.Visualisation of the class activation mapping using the Grad-CAM algorithm indicates that our model is likely to locate the identified minerals effectively.In this way,the importance of each pixel with the class of interest can be assessed using heat maps.The recorded results,in terms of both performance and pixel_level evaluation,demonstrate the promising potential of the model used.It can therefore be considered for multi-labels image classification,particulary for images representing rock minerals.This approach serves as a valuable support tool for geological studies.
文摘Kinship verification is a key biometric recognition task that determines biological relationships based on physical features.Traditional methods predominantly use facial recognition,leveraging established techniques and extensive datasets.However,recent research has highlighted ear recognition as a promising alternative,offering advantages in robustness against variations in facial expressions,aging,and occlusions.Despite its potential,a significant challenge in ear-based kinship verification is the lack of large-scale datasets necessary for training deep learning models effectively.To address this challenge,we introduce the EarKinshipVN dataset,a novel and extensive collection of ear images designed specifically for kinship verification.This dataset consists of 4876 high-resolution color images from 157 multiracial families across different regions,forming 73,220 kinship pairs.EarKinshipVN,a diverse and large-scale dataset,advances kinship verification research using ear features.Furthermore,we propose the Mixer Attention Inception(MAI)model,an improved architecture that enhances feature extraction and classification accuracy.The MAI model fuses Inceptionv4 and MLP Mixer,integrating four attention mechanisms to enhance spatial and channel-wise feature representation.Experimental results demonstrate that MAI significantly outperforms traditional backbone architectures.It achieves an accuracy of 98.71%,surpassing Vision Transformer models while reducing computational complexity by up to 95%in parameter usage.These findings suggest that ear-based kinship verification,combined with an optimized deep learning model and a comprehensive dataset,holds significant promise for biometric applications.
文摘Nuclei segmentation is a challenging task in histopathology images.It is challenging due to the small size of objects,low contrast,touching boundaries,and complex structure of nuclei.Their segmentation and counting play an important role in cancer identification and its grading.In this study,WaveSeg-UNet,a lightweight model,is introduced to segment cancerous nuclei having touching boundaries.Residual blocks are used for feature extraction.Only one feature extractor block is used in each level of the encoder and decoder.Normally,images degrade quality and lose important information during down-sampling.To overcome this loss,discrete wavelet transform(DWT)alongside maxpooling is used in the down-sampling process.Inverse DWT is used to regenerate original images during up-sampling.In the bottleneck of the proposed model,atrous spatial channel pyramid pooling(ASCPP)is used to extract effective high-level features.The ASCPP is the modified pyramid pooling having atrous layers to increase the area of the receptive field.Spatial and channel-based attention are used to focus on the location and class of the identified objects.Finally,watershed transform is used as a post processing technique to identify and refine touching boundaries of nuclei.Nuclei are identified and counted to facilitate pathologists.The same domain of transfer learning is used to retrain the model for domain adaptability.Results of the proposed model are compared with state-of-the-art models,and it outperformed the existing studies.
文摘In the evolving landscape of secure communication,steganography has become increasingly vital to secure the transmission of secret data through an insecure public network.Several steganographic algorithms have been proposed using digital images with a common objective of balancing a trade-off between the payload size and the quality of the stego image.In the existing steganographic works,a remarkable distortion of the stego image persists when the payload size is increased,making several existing works impractical to the current world of vast data.This paper introduces FuzzyStego,a novel approach designed to enhance the stego image’s quality by minimizing the effect of the payload size on the stego image’s quality.In line with the limitations of traditional methods like Pixel Value Differencing(PVD),Transform Domain Techniques,and Least Significant Bit(LSB)insertion,such as image quality degradation,vulnerability to processing attacks,and restricted capacity,FuzzyStego utilizes fuzzy logic to categorize pixels into intensity levels:Low(L),Medium-Low(ML),Medium(M),Medium-High(MH),and High(H).This classification enables adaptive data embedding,minimizing detectability by adjusting the hidden bit count according to the intensity levels.Experimental results show that FuzzyStego achieves an average Peak Signal-to-Noise Ratio(PSNR)of 58.638 decibels(dB)and a Structural Similarity Index Measure(SSIM)of almost 1.00,demonstrating its promising capability to preserve image quality while embedding data effectively.
基金funded by Department of Robotics and Mechatronics Engineering,Kennesaw State University,Marietta,GA 30060,USA.
文摘Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.
文摘For the history of medical culture in the world,the exchange and transmission of medical knowledge has formed an important part of mutual learning among different cultures,which has also increasingly shown unique academic value in the study of knowledge history.Traditional Eastern medicine(such as Chinese medicine,Indian ayurvedic medicine,Persian medicine,Arabic medicine),and other medical systems in the ancient Western world(including Greek medicine and Roman medicine)have left precious literature/texts,cultural relics(for example,pills,preparations,medical instruments),folklore and legends,which truly record the process of learning,transplantation,fusion and succession after the encounter of different medical systems at least for the past two thousand years.
基金financed by the grant from the Major Project of the National Social Science Fund of China(No.20&ZD222).
文摘Edo-period historical records and documents preserved a substantial number of images,many of which are related to epidemic outbreaks.Through systematic collation and categorical analysis,this study uses the chronological and thematic characteristics of these images as a framework to examine the response mechanisms of the Japanese government and public during infectious disease pandemics in the Edo period,as well as the multidimensional impacts of epidemics on social economy,culture,and customs.Illustrations of smallpox in medical texts reveal the developmental trajectory of Japan’s traditional medical knowledge system,while drawings in essays and diaries reflect public fear and non-medical cognitive patterns during cholera outbreaks.Epidemic-themed paintings not only document cholera treatment protocols by the government and medical professionals,as well as grassroots prevention and treatment practices for measles,but also vividly depict social dynamics during crises.Images related to epidemics in advertising reflect the prosperity of the pharmaceutical industry in the Edo period,while depictions in folding screens,ukiyozoushi and the occupational illustrations demonstrate societal customs for epidemic response.Collectively,the Edo-period epidemic crises profoundly shaped Japan’s medical system,economic structure,cultural forms,folk traditions,and public psychology,prompting the government,medical professionals,and civilians to develop distinct era-specific social coping mechanisms.
文摘A novel CNN-Mamba hybrid architecture was proposed to address intra-class variance and inter-class similarity in remote sensing imagery.The framework integrates:(1)parallel CNN and visual state space(VSS)encoders,(2)multi-scale cross-attention feature fusion,and(3)a boundary-constrained decoder.This design overcomes CNN s limited receptive fields and ViT s quadratic complexity while efficiently capturing both local features and global dependencies.Evaluations on LoveDA and ISPRS Vaihingen datasets demonstrate superior segmentation accuracy and boundary preservation compared to existing approaches,with the dual-branch structure maintaining computational efficiency throughout the process.
基金Researchers supporting Project number(RSPD2025R1107),King Saud University,Riyadh,Saudi Arabia.
文摘Autism spectrum disorder(ASD)is a multifaceted neurological developmental condition that manifests in several ways.Nearly all autistic children remain undiagnosed before the age of three.Developmental problems affecting face features are often associated with fundamental brain disorders.The facial evolution of newborns with ASD is quite different from that of typically developing children.Early recognition is very significant to aid families and parents in superstition and denial.Distinguishing facial features from typically developing children is an evident manner to detect children analyzed with ASD.Presently,artificial intelligence(AI)significantly contributes to the emerging computer-aided diagnosis(CAD)of autism and to the evolving interactivemethods that aid in the treatment and reintegration of autistic patients.This study introduces an Ensemble of deep learning models based on the autism spectrum disorder detection in facial images(EDLM-ASDDFI)model.The overarching goal of the EDLM-ASDDFI model is to recognize the difference between facial images of individuals with ASD and normal controls.In the EDLM-ASDDFI method,the primary level of data pre-processing is involved by Gabor filtering(GF).Besides,the EDLM-ASDDFI technique applies the MobileNetV2 model to learn complex features from the pre-processed data.For the ASD detection process,the EDLM-ASDDFI method uses ensemble techniques for classification procedure that encompasses long short-term memory(LSTM),deep belief network(DBN),and hybrid kernel extreme learning machine(HKELM).Finally,the hyperparameter selection of the three deep learning(DL)models can be implemented by the design of the crested porcupine optimizer(CPO)technique.An extensive experiment was conducted to emphasize the improved ASD detection performance of the EDLM-ASDDFI method.The simulation outcomes indicated that the EDLM-ASDDFI technique highlighted betterment over other existing models in terms of numerous performance measures.
基金funded by International School,Vietnam National University,Hanoi(VNU-IS)under project number CS.2023-10.
文摘Global climate change,along with the rapid increase of the population,has put significant pressure on water security.A water reservoir is an effective solution for adjusting and ensuring water supply.In particular,the reservoir water level is an essential physical indicator for the reservoirs.Forecasting the reservoir water level effectively assists the managers in making decisions and plans related to reservoir management policies.In recent years,deep learning models have been widely applied to solve forecasting problems.In this study,we propose a novel hybrid deep learning model namely the YOLOv9_ConvLSTM that integrates YOLOv9,ConvLSTM,and linear interpolation to predict reservoir water levels.It utilizes data from Sentinel-2 satellite images,generated from visible spectrum bands(Red-Blue-Green)to reconstruct true-color reservoir images.Adam is used as the optimization algorithm with the loss function being MSE(Mean Squared Error)to evaluate the model’s error during training.We implemented and validated the proposed model using Sentinel-2 satellite imagery for the An Khe reservoir in Vietnam.To assess its performance,we also conducted comparative experiments with other related models,including SegNet_ConvLSTM and UNet_ConvLSTM,on the same dataset.The model performances were validated using k-fold cross-validation and ANOVA analysis.The experimental results demonstrate that the YOLOv9_ConvLSTM model outperforms the compared models.It has been seen that the proposed approach serves as a valuable tool for reservoir water level forecasting using satellite imagery that contributes to effective water resource management.
基金supported by the Fundamental Research Funds for the Central Universities(No.226-2024-00038),China.
文摘The internal structures of cells as the basic units of life are a major wonder of the microscopic world.Cellular images provide an intriguing window to help explore and understand the composition and function of these structures.Scientific imagery combined with artistic expression can further expand the potential of imaging in educational dissemination and interdisciplinary applications.
基金Supported by National Natural Science Foundation of China(No.82074524)Harbin Medical University Graduate Research and Practice Innovation Project(No.YJSCX2023-50HYD).
文摘AIM:To develop different machine learning models to train and test diplopia images and data generated by the computerized diplopia test.METHODS:Diplopia images and data generated by computerized diplopia tests,along with patient medical records,were retrospectively collected from 3244 cases.Diagnostic models were constructed using logistic regression(LR),decision tree(DT),support vector machine(SVM),extreme gradient boosting(XGBoost),and deep learning(DL)algorithms.A total of 2757 diplopia images were randomly selected as training data,while the test dataset contained 487 diplopia images.The optimal diagnostic model was evaluated using test set accuracy,confusion matrix,and precision-recall curve(P-R curve).RESULTS:The test set accuracy of the LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.762,0.811,0.818,0.812,0.858 and 0.858,respectively.The accuracy in the training set was 0.785,0.815,0.998,0.965,0.968,and 0.967,respectively.The weighted precision of LR,SVM,DT,XGBoost,DL(64 categories),and DL(6 binary classifications)algorithms was 0.74,0.77,0.83,0.80,0.85,and 0.85,respectively;weighted recall was 0.76,0.81,0.82,0.81,0.86,and 0.86,respectively;weighted F1 score was 0.74,0.79,0.82,0.80,0.85,and 0.85,respectively.CONCLUSION:In this study,the 7 machine learning algorithms all achieve automatic diagnosis of extraocular muscle palsy.The DL(64 categories)and DL(6 binary classifications)algorithms have a significant advantage over other machine learning algorithms regarding diagnostic accuracy on the test set,with a high level of consistency with clinical diagnoses made by physicians.Therefore,it can be used as a reference for diagnosis.
基金supported by the Key Research and DevelopmentPlan in Shanxi Province of China(No.201803D421045)the Natural Science Foundation of Shanxi Province(No.2021-0302-123104)。
文摘This study introduces a novel method for reconstructing the 3D model of aluminum foam using cross-sectional sequence images.Combining precision milling and image acquisition,high-qual-ity cross-sectional images are obtained.Pore structures are segmented by the U-shaped network(U-Net)neural network integrated with the Canny edge detection operator,ensuring accurate pore delineation and edge extraction.The trained U-Net achieves 98.55%accuracy.The 2D data are superimposed and processed into 3D point clouds,enabling reconstruction of the pore structure and aluminum skeleton.Analysis of pore 01 shows the cross-sectional area initially increases,and then decreases with milling depth,with a uniform point distribution of 40 per layer.The reconstructed model exhibits a porosity of 77.5%,with section overlap rates between the 2D pore segmentation and the reconstructed model exceeding 96%,confirming high fidelity.Equivalent sphere diameters decrease with size,averaging 1.95 mm.Compression simulations reveal that the stress-strain curve of the 3D reconstruction model of aluminum foam exhibits fluctuations,and the stresses in the reconstruction model concentrate on thin cell walls,leading to localized deformations.This method accurately restores the aluminum foam’s complex internal structure,improving reconstruction preci-sion and simulation reliability.The approach offers a cost-efficient,high-precision technique for optimizing material performance in engineering applications.
文摘Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.
基金co-supported by the National Natural Science Foundation of China(No.62103190)the Natural Science Foundation of Jiangsu Province,China(No.BK20230923)。
文摘When detecting objects in Unmanned Aerial Vehicle(UAV)taken images,large number of objects and high proportion of small objects bring huge challenges for detection algorithms based on the You Only Look Once(YOLO)framework,rendering them challenging to deal with tasks that demand high precision.To address these problems,this paper proposes a high-precision object detection algorithm based on YOLOv10s.Firstly,a Multi-branch Enhancement Coordinate Attention(MECA)module is proposed to enhance feature extraction capability.Secondly,a Multilayer Feature Reconstruction(MFR)mechanism is designed to fully exploit multilayer features,which can enrich object information as well as remove redundant information.Finally,an MFR Path Aggregation Network(MFR-Neck)is constructed,which integrates multi-scale features to improve the network's ability to perceive objects of var-ying sizes.The experimental results demonstrate that the proposed algorithm increases the average detection accuracy by 14.15%on the Vis Drone dataset compared to YOLOv10s,effectively enhancing object detection precision in UAV-taken images.
基金supported by the National Natural Science Foundation of China(Nos.62276210,82201148 and 62376215)the Key Research and Development Project of Shaanxi Province(No.2025CY-YBXM-044)+3 种基金the Natural Science Foundation of Zhejiang Province(No.LQ22H120002)the Medical Health Science and Technology Project of Zhejiang Province(Nos.2022RC069 and 2023KY1140)the Natural Science Foundation of Ningbo(No.2023J390)the Ningbo Top Medical and Health Research Program(No.2023030716).
文摘This paper proposes a novel method for the automatic diagnosis of keratitis using feature vector quantization and self-attention mechanisms(ADK_FVQSAM).First,high-level features are extracted using the DenseNet121 backbone network,followed by adaptive average pooling to scale the features to a fixed length.Subsequently,product quantization with residuals(PQR)is applied to convert continuous feature vectors into discrete features representations,preserving essential information insensitive to image quality variations.The quantized and original features are concatenated and fed into a self-attention mechanism to capture keratitis-related features.Finally,these enhanced features are classified through a fully connected layer.Experiments on clinical low-quality(LQ)images show that ADK_FVQSAM achieves accuracies of 87.7%,81.9%,and 89.3% for keratitis,other corneal abnormalities,and normal corneas,respectively.Compared to DenseNet121,Swin transformer,and InceptionResNet,ADK_FVQSAM improves average accuracy by 3.1%,11.3%,and 15.3%,respectively.These results demonstrate that ADK_FVQSAM significantly enhances the recognition performance of keratitis based on LQ slit-lamp images,offering a practical approach for clinical application.