At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correla...At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.展开更多
To reduce the discrepancy between the source and target domains,a new multi-label adaptation network(ML-ANet)based on multiple kernel variants with maximum mean discrepancies is proposed in this paper.The hidden repre...To reduce the discrepancy between the source and target domains,a new multi-label adaptation network(ML-ANet)based on multiple kernel variants with maximum mean discrepancies is proposed in this paper.The hidden representations of the task-specific layers in ML-ANet are embedded in the reproducing kernel Hilbert space(RKHS)so that the mean-embeddings of specific features in different domains could be precisely matched.Multiple kernel functions are used to improve feature distribution efficiency for explicit mean embedding matching,which can further reduce domain discrepancy.Adverse weather and cross-camera adaptation examinations are conducted to verify the effectiveness of our proposed ML-ANet.The results show that our proposed ML-ANet achieves higher accuracies than the compared state-of-the-art methods for multi-label image classification in both the adverse weather adaptation and cross-camera adaptation experiments.These results indicate that ML-ANet can alleviate the reliance on fully labeled training data and improve the accuracy of multi-label image classification in various domain shift scenarios.展开更多
Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread a...Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.展开更多
Alzheimer’s Disease(AD)is a progressive neurodegenerative disorder that significantly affects cognitive function,making early and accurate diagnosis essential.Traditional Deep Learning(DL)-based approaches often stru...Alzheimer’s Disease(AD)is a progressive neurodegenerative disorder that significantly affects cognitive function,making early and accurate diagnosis essential.Traditional Deep Learning(DL)-based approaches often struggle with low-contrast MRI images,class imbalance,and suboptimal feature extraction.This paper develops a Hybrid DL system that unites MobileNetV2 with adaptive classification methods to boost Alzheimer’s diagnosis by processing MRI scans.Image enhancement is done using Contrast-Limited Adaptive Histogram Equalization(CLAHE)and Enhanced Super-Resolution Generative Adversarial Networks(ESRGAN).A classification robustness enhancement system integrates class weighting techniques and a Matthews Correlation Coefficient(MCC)-based evaluation method into the design.The trained and validated model gives a 98.88%accuracy rate and 0.9614 MCC score.We also performed a 10-fold cross-validation experiment with an average accuracy of 96.52%(±1.51),a loss of 0.1671,and an MCC score of 0.9429 across folds.The proposed framework outperforms the state-of-the-art models with a 98%weighted F1-score while decreasing misdiagnosis results for every AD stage.The model demonstrates apparent separation abilities between AD progression stages according to the results of the confusion matrix analysis.These results validate the effectiveness of hybrid DL models with adaptive preprocessing for early and reliable Alzheimer’s diagnosis,contributing to improved computer-aided diagnosis(CAD)systems in clinical practice.展开更多
Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relati...Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relationships among nodes.This paper proposes a novel graph coupling convolutional model that introduces an adaptive weighting mechanism to assign distinct importance to neighboring nodes based on their similarity to the central node.Unlike traditional methods,the proposed coupling strategy enhances the interpretability of node interactions while maintaining competitive classification performance.The model operates in the spatial domain,utilizing adjacency list structures for efficient convolution and addressing the limitations of weight sharing through a coupling-based similarity computation.Extensive experiments are conducted on five graph-structured datasets,including Cora,Citeseer,PubMed,Reddit,and BlogCatalog,as well as a custom topology dataset constructed from the Open University Learning Analytics Dataset(OULAD)educational platform.Results demonstrate that the proposed model achieves good classification accuracy,while significantly reducing training time through direct second-order neighbor fusion and data preprocessing.Moreover,analysis of neighborhood order reveals that considering third-order neighbors offers limited accuracy gains but introduces considerable computational overhead,confirming the efficiency of first-and second-order convolution in practical applications.Overall,the proposed graph coupling model offers a lightweight,interpretable,and effective framework for multi-label node classification in complex networks.展开更多
The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and hist...The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.展开更多
In this paper,we study the partial multi-label(PML)image classification problem,where each image is annotated with a candidate label set consisting of multiple relevant labels and other noisy labels.Existing PML metho...In this paper,we study the partial multi-label(PML)image classification problem,where each image is annotated with a candidate label set consisting of multiple relevant labels and other noisy labels.Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions,which unfortunately is unavailable in many real tasks.Furthermore,because the objective function for disambiguation is usually elaborately designed on the whole training set,it can hardly be optimized in a deep model with stochastic gradient descent(SGD)on mini-batches.In this paper,for the first time,we propose a deep model for PML to enhance the representation and discrimination ability.On the one hand,we propose a novel curriculum-based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes.On the other hand,consistency regularization is introduced for model training to balance fitting identified easy labels and exploiting potential relevant labels.Extensive experimental results on the commonly used benchmark datasets show that the proposed method significantlyoutperforms the SOTA methods.展开更多
Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-a...Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-aided diagnosis system is highly required.Ultrasound is an important examination technique for breast cancer diagnosis due to its low cost.Recently,many learning-based techniques have been introduced to classify breast cancer using breast ultrasound imaging dataset(BUSI)datasets;however,the manual handling is not an easy process and time consuming.The authors propose an EfficientNet-integrated ResNet deep network and XAI-based framework for accurately classifying breast cancer(malignant and benign).In the initial step,data augmentation is performed to increase the number of training samples.For this purpose,three-pixel flip mathematical equations are introduced:horizontal,vertical,and 90°.Later,two pretrained deep learning models were employed,skipped some layers,and fine-tuned.Both fine-tuned models are later trained using a deep transfer learning process and extracted features from the deeper layer.Explainable artificial intelligence-based analysed the performance of trained models.After that,a new feature selection technique is proposed based on the cuckoo search algorithm called cuckoo search controlled standard error mean.This technique selects the best features and fuses using a new parallel zeropadding maximum correlated coefficient features.In the end,the selection algorithm is applied again to the fused feature vector and classified using machine learning algorithms.The experimental process of the proposed framework is conducted on a publicly available BUSI and obtained 98.4%and 98%accuracy in two different experiments.Comparing the proposed framework is also conducted with recent techniques and shows improved accuracy.In addition,the proposed framework was executed less than the original deep learning models.展开更多
The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring ef...The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.展开更多
Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp...Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.展开更多
In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hi...In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning b...Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.展开更多
Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addres...Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.展开更多
In recent years,camouflage technology has evolved from single-spectral-band applications to multifunctional and multispectral implementations.Hyperspectral imaging has emerged as a powerful technique for target identi...In recent years,camouflage technology has evolved from single-spectral-band applications to multifunctional and multispectral implementations.Hyperspectral imaging has emerged as a powerful technique for target identification due to its capacity to capture both spectral and spatial information.The advancement of imaging spectroscopy technology has significantly enhanced reconnaissance capabilities,offering substantial advantages in camouflaged target classification and detection.However,the increasing spectral similarity between camouflaged targets and their backgrounds has significantly compromised detection performance in specific scenarios.Conventional feature extraction methods are often limited to single,shallow spectral or spatial features,failing to extract deep features and consequently yielding suboptimal classification accuracy.To address these limitations,this study proposes an innovative 3D-2D convolutional neural networks architecture incorporating depthwise separable convolution(DSC)and attention mechanisms(AM).The framework first applies dimensionality reduction to hyperspectral images and extracts preliminary spectral-spatial features.It then employs an alternating combination of 3D and 2D convolutions for deep feature extraction.For target classification,the LogSoftmax function is implemented.The integration of depthwise separable convolution not only enhances classification accuracy but also substantially reduces model parameters.Furthermore,the attention mechanisms significantly improve the network's ability to represent multidimensional features.Extensive experiments were conducted on a custom land-based hyperspectral image dataset.The results demonstrate remarkable classification accuracy:98.74%for grassland camouflage,99.13%for dead leaf camouflage and 98.94%for wild grass camouflage.Comparative analysis shows that the proposed framework is outstanding in terms of classification accuracy and robustness for camouflage target classification.展开更多
Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlineari...Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlinearity is challenging,and omitting the nonlinear layers in a standard CNN comes with a significant reduction in accuracy.We use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend(two fully connected layers).We obtain comparable performance with a purely electronic CNN with five convolutional layers and three fully connected layers.We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic.Using this hybrid approach,we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86 K in the hybrid compressed network enabled by the optical front end.This constitutes over 2 orders of magnitude of reduction in latency and power consumption.Furthermore,we experimentally demonstrate that the classification accuracy of the system exceeds 93%on the MNIST dataset of handwritten digits.展开更多
Ferroelectric thin film transistors(FeTFTs)have attracted great attention for in-memory computing appli-cations due to low power consumption and monolithic three-dimensional integration capability.Herein,we propose a ...Ferroelectric thin film transistors(FeTFTs)have attracted great attention for in-memory computing appli-cations due to low power consumption and monolithic three-dimensional integration capability.Herein,we propose a planar integrated highly-reliable metal-ferroelectric-metal-insulator-semiconductor FeTFTs device,in which the weak erase issue is suppressed by implanting a floating gate,and the interface de-fects are reduced by simplifying the fabrication process.These lead to significant improvements in device performance,including large memory window(4.3 V),high conductance dynamic range(1400),high en-durance(10^(12)),and low variation(cycle-to-cycle:2.5%/device-to-device:3.5%).Moreover,we fabricated a 16×16 FeTFTs pseudo-crossbar array for in-memory computing and experimentally demonstrated full hardware implementation of multi-layer perceptron for the classification of four fundamental arithmetic operation symbols.This work provides a potential hardware solution for implementing a highly-efficient in-memory computing system based on highly-reliable FeTFTs array.展开更多
In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution an...In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution and complex category boundary,mixed pixels containing features from multiple classes are inevitable in HSIs.Additionally,the spectral similarity among different classes challenge for extracting distinctive spectral features essential for HSIC.To address the impact of mixed pixels and spectral similarity for HSIC,we propose a central-pixel guiding sub-pixel and sub-channel convolution network(CP-SPSC)to extract more precise spatial and spectral features.Firstly,we designed spatial attention(CP-SPA)and spectral attention(CP-SPE)informed by the central pixel to effectively reduce spectral interference of irrelevant categories in the same patch.Furthermore,we use CP-SPA to guide 2D sub-pixel convolution(SPConv2d)to capture spatial features finer than the pixel level.Meanwhile,CP-SPE is also utilized to guide 1D sub-channel con-volution(SCConv1d)in selecting more precise spectral channels.For fusing spatial and spectral information at the feature-level,the spectral feature extension transformation module(SFET)adopts mirror-padding and snake permutation to transform 1D spectral information of the center pixel into 2D spectral features.Experiments on three popular datasets demonstrate that ours out-performs several state-of-the-art methods in accuracy.展开更多
Skin cancer is among the most common malignancies worldwide,but its mortality burden is largely driven by aggressive subtypes such as melanoma,with outcomes varying across regions and healthcare settings.These variati...Skin cancer is among the most common malignancies worldwide,but its mortality burden is largely driven by aggressive subtypes such as melanoma,with outcomes varying across regions and healthcare settings.These variations emphasize the importance of reliable diagnostic technologies that support clinicians in detecting skin malignancies with higher accuracy.Traditional diagnostic methods often rely on subjective visual assessments,which can lead to misdiagnosis.This study addresses these challenges by developing HybridFusionNet,a novel model that integrates Convolutional Neural Networks(CNN)with 1D feature extraction techniques to enhance diagnostic accuracy.Utilizing two extensive datasets,BCN20000 and HAM10000,the methodology includes data preprocessing,application of Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors(SMOTEENN)for data balancing,and optimization of feature selection using the Tree-based Pipeline Optimization Tool(TPOT).The results demonstrate significant performance improvements over traditional CNN models,achieving an accuracy of 0.9693 on the BCN20000 dataset and 0.9909 on the HAM10000 dataset.The HybridFusionNet model not only outperforms conventionalmethods but also effectively addresses class imbalance.To enhance transparency,it integrates post-hoc explanation techniques such as LIME,which highlight the features influencing predictions.These findings highlight the potential of HybridFusionNet to support real-world applications,including physician-assist systems,teledermatology,and large-scale skin cancer screening programs.By improving diagnostic efficiency and enabling access to expert-level analysis,the modelmay enhance patient outcomes and foster greater trust in artificial intelligence(AI)-assisted clinical decision-making.展开更多
In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.Thi...In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.This research proposes an integrated methodological framework,based on the principles of model-driven engineering(MDE),to transform a generic meta-model into a meta-model specifically dedicated to urban satellite image classification.We implemented this transformation using the Atlas Transformation Language(ATL),guaranteeing a smooth and consistent transition from platform-independent model(PIM)to platform-specific model(PSM),according to the principles of model-driven architecture(MDA).The application of this IDM methodology enables advanced structuring of satellite data for targeted urban planning analyses,making it possible to classify various urban zones such as built-up,cultivated,arid and water areas.The novelty of this approach lies in the automation and standardization of the classification process,which significantly reduces the need for manual intervention,and thus improves the reliability,reproducibility and efficiency of urban data analysis.By adopting this method,decision-makers and urban planners are provided with a powerful tool for systematically and consistently analyzing and interpreting satellite images,facilitating decision-making in critical areas such as urban space management,infrastructure planning and environmental preservation.展开更多
基金the National Natural Science Foundation of China(Nos.62167005 and 61966018)the Key Research Projects of Jiangxi Provincial Department of Education(No.GJJ200302)。
文摘At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.
基金Supported by Shenzhen Fundamental Research Fund of China(Grant No.JCYJ20190808142613246)National Natural Science Foundation of China(Grant No.51805332),and Young Elite Scientists Sponsorship Program funded by the China Society of Automotive Engineers.
文摘To reduce the discrepancy between the source and target domains,a new multi-label adaptation network(ML-ANet)based on multiple kernel variants with maximum mean discrepancies is proposed in this paper.The hidden representations of the task-specific layers in ML-ANet are embedded in the reproducing kernel Hilbert space(RKHS)so that the mean-embeddings of specific features in different domains could be precisely matched.Multiple kernel functions are used to improve feature distribution efficiency for explicit mean embedding matching,which can further reduce domain discrepancy.Adverse weather and cross-camera adaptation examinations are conducted to verify the effectiveness of our proposed ML-ANet.The results show that our proposed ML-ANet achieves higher accuracies than the compared state-of-the-art methods for multi-label image classification in both the adverse weather adaptation and cross-camera adaptation experiments.These results indicate that ML-ANet can alleviate the reliance on fully labeled training data and improve the accuracy of multi-label image classification in various domain shift scenarios.
文摘Multi-label image classification is recognized as an important task within the field of computer vision,a discipline that has experienced a significant escalation in research endeavors in recent years.The widespread adoption of convolutional neural networks(CNNs)has catalyzed the remarkable success of architectures such as ResNet-101 within the domain of image classification.However,inmulti-label image classification tasks,it is crucial to consider the correlation between labels.In order to improve the accuracy and performance of multi-label classification and fully combine visual and semantic features,many existing studies use graph convolutional networks(GCN)for modeling.Object detection and multi-label image classification exhibit a degree of conceptual overlap;however,the integration of these two tasks within a unified framework has been relatively underexplored in the existing literature.In this paper,we come up with Object-GCN framework,a model combining object detection network YOLOv5 and graph convolutional network,and we carry out a thorough experimental analysis using a range of well-established public datasets.The designed framework Object-GCN achieves significantly better performance than existing studies in public datasets COCO2014,VOC2007,VOC2012.The final results achieved are 86.9%,96.7%,and 96.3%mean Average Precision(mAP)across the three datasets.
基金funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.(DGSSR-2025-02-01295).
文摘Alzheimer’s Disease(AD)is a progressive neurodegenerative disorder that significantly affects cognitive function,making early and accurate diagnosis essential.Traditional Deep Learning(DL)-based approaches often struggle with low-contrast MRI images,class imbalance,and suboptimal feature extraction.This paper develops a Hybrid DL system that unites MobileNetV2 with adaptive classification methods to boost Alzheimer’s diagnosis by processing MRI scans.Image enhancement is done using Contrast-Limited Adaptive Histogram Equalization(CLAHE)and Enhanced Super-Resolution Generative Adversarial Networks(ESRGAN).A classification robustness enhancement system integrates class weighting techniques and a Matthews Correlation Coefficient(MCC)-based evaluation method into the design.The trained and validated model gives a 98.88%accuracy rate and 0.9614 MCC score.We also performed a 10-fold cross-validation experiment with an average accuracy of 96.52%(±1.51),a loss of 0.1671,and an MCC score of 0.9429 across folds.The proposed framework outperforms the state-of-the-art models with a 98%weighted F1-score while decreasing misdiagnosis results for every AD stage.The model demonstrates apparent separation abilities between AD progression stages according to the results of the confusion matrix analysis.These results validate the effectiveness of hybrid DL models with adaptive preprocessing for early and reliable Alzheimer’s diagnosis,contributing to improved computer-aided diagnosis(CAD)systems in clinical practice.
基金Support by Sichuan Science and Technology Program[2023YFSY0026,2023YFH0004]Guangzhou Huashang University[2024HSZD01,HS2023JYSZH01].
文摘Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relationships among nodes.This paper proposes a novel graph coupling convolutional model that introduces an adaptive weighting mechanism to assign distinct importance to neighboring nodes based on their similarity to the central node.Unlike traditional methods,the proposed coupling strategy enhances the interpretability of node interactions while maintaining competitive classification performance.The model operates in the spatial domain,utilizing adjacency list structures for efficient convolution and addressing the limitations of weight sharing through a coupling-based similarity computation.Extensive experiments are conducted on five graph-structured datasets,including Cora,Citeseer,PubMed,Reddit,and BlogCatalog,as well as a custom topology dataset constructed from the Open University Learning Analytics Dataset(OULAD)educational platform.Results demonstrate that the proposed model achieves good classification accuracy,while significantly reducing training time through direct second-order neighbor fusion and data preprocessing.Moreover,analysis of neighborhood order reveals that considering third-order neighbors offers limited accuracy gains but introduces considerable computational overhead,confirming the efficiency of first-and second-order convolution in practical applications.Overall,the proposed graph coupling model offers a lightweight,interpretable,and effective framework for multi-label node classification in complex networks.
文摘The integration of image analysis through deep learning(DL)into rock classification represents a significant leap forward in geological research.While traditional methods remain invaluable for their expertise and historical context,DL offers a powerful complement by enhancing the speed,objectivity,and precision of the classification process.This research explores the significance of image data augmentation techniques in optimizing the performance of convolutional neural networks(CNNs)for geological image analysis,particularly in the classification of igneous,metamorphic,and sedimentary rock types from rock thin section(RTS)images.This study primarily focuses on classic image augmentation techniques and evaluates their impact on model accuracy and precision.Results demonstrate that augmentation techniques like Equalize significantly enhance the model's classification capabilities,achieving an F1-Score of 0.9869 for igneous rocks,0.9884 for metamorphic rocks,and 0.9929 for sedimentary rocks,representing improvements compared to the baseline original results.Moreover,the weighted average F1-Score across all classes and techniques is 0.9886,indicating an enhancement.Conversely,methods like Distort lead to decreased accuracy and F1-Score,with an F1-Score of 0.949 for igneous rocks,0.954 for metamorphic rocks,and 0.9416 for sedimentary rocks,exacerbating the performance compared to the baseline.The study underscores the practicality of image data augmentation in geological image classification and advocates for the adoption of DL methods in this domain for automation and improved results.The findings of this study can benefit various fields,including remote sensing,mineral exploration,and environmental monitoring,by enhancing the accuracy of geological image analysis both for scientific research and industrial applications.
文摘In this paper,we study the partial multi-label(PML)image classification problem,where each image is annotated with a candidate label set consisting of multiple relevant labels and other noisy labels.Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions,which unfortunately is unavailable in many real tasks.Furthermore,because the objective function for disambiguation is usually elaborately designed on the whole training set,it can hardly be optimized in a deep model with stochastic gradient descent(SGD)on mini-batches.In this paper,for the first time,we propose a deep model for PML to enhance the representation and discrimination ability.On the one hand,we propose a novel curriculum-based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes.On the other hand,consistency regularization is introduced for model training to balance fitting identified easy labels and exploiting potential relevant labels.Extensive experimental results on the commonly used benchmark datasets show that the proposed method significantlyoutperforms the SOTA methods.
文摘Breast cancer is one of the major causes of deaths in women.However,the early diagnosis is important for screening and control the mortality rate.Thus for the diagnosis of breast cancer at the early stage,a computer-aided diagnosis system is highly required.Ultrasound is an important examination technique for breast cancer diagnosis due to its low cost.Recently,many learning-based techniques have been introduced to classify breast cancer using breast ultrasound imaging dataset(BUSI)datasets;however,the manual handling is not an easy process and time consuming.The authors propose an EfficientNet-integrated ResNet deep network and XAI-based framework for accurately classifying breast cancer(malignant and benign).In the initial step,data augmentation is performed to increase the number of training samples.For this purpose,three-pixel flip mathematical equations are introduced:horizontal,vertical,and 90°.Later,two pretrained deep learning models were employed,skipped some layers,and fine-tuned.Both fine-tuned models are later trained using a deep transfer learning process and extracted features from the deeper layer.Explainable artificial intelligence-based analysed the performance of trained models.After that,a new feature selection technique is proposed based on the cuckoo search algorithm called cuckoo search controlled standard error mean.This technique selects the best features and fuses using a new parallel zeropadding maximum correlated coefficient features.In the end,the selection algorithm is applied again to the fused feature vector and classified using machine learning algorithms.The experimental process of the proposed framework is conducted on a publicly available BUSI and obtained 98.4%and 98%accuracy in two different experiments.Comparing the proposed framework is also conducted with recent techniques and shows improved accuracy.In addition,the proposed framework was executed less than the original deep learning models.
基金supported by the National Natural Science Foundation of China(Grant Nos.62071315 and 62271336).
文摘The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.
基金the Deanship of Scientifc Research at King Khalid University for funding this work through large group Research Project under grant number RGP2/421/45supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2024/R/1446)+1 种基金supported by theResearchers Supporting Project Number(UM-DSR-IG-2023-07)Almaarefa University,Riyadh,Saudi Arabia.supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1F1A1055408).
文摘Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.
基金Supported by the National Natural Science Foundation of China(61601176)。
文摘In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金funded by Innovation and Development Special Project of China Meteorological Administration(CXFZ2022J038,CXFZ2024J035)Sichuan Science and Technology Program(No.2023YFQ0072)+1 种基金Key Laboratory of Smart Earth(No.KF2023YB03-07)Automatic Software Generation and Intelligent Service Key Laboratory of Sichuan Province(CUIT-SAG202210).
文摘Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R435),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Segmenting a breast ultrasound image is still challenging due to the presence of speckle noise,dependency on the operator,and the variation of image quality.This paper presents the UltraSegNet architecture that addresses these challenges through three key technical innovations:This work adds three things:(1)a changed ResNet-50 backbone with sequential 3×3 convolutions to keep fine anatomical details that are needed for finding lesion boundaries;(2)a computationally efficient regional attention mechanism that works on high-resolution features without using a transformer’s extra memory;and(3)an adaptive feature fusion strategy that changes local and global featuresbasedonhowthe image isbeing used.Extensive evaluation on two distinct datasets demonstrates UltraSegNet’s superior performance:On the BUSI dataset,it obtains a precision of 0.915,a recall of 0.908,and an F1 score of 0.911.In the UDAIT dataset,it achieves robust performance across the board,with a precision of 0.901 and recall of 0.894.Importantly,these improvements are achieved at clinically feasible computation times,taking 235 ms per image on standard GPU hardware.Notably,UltraSegNet does amazingly well on difficult small lesions(less than 10 mm),achieving a detection accuracy of 0.891.This is a huge improvement over traditional methods that have a hard time with small-scale features,as standard models can only achieve 0.63–0.71 accuracy.This improvement in small lesion detection is particularly crucial for early-stage breast cancer identification.Results from this work demonstrate that UltraSegNet can be practically deployable in clinical workflows to improve breast cancer screening accuracy.
文摘In recent years,camouflage technology has evolved from single-spectral-band applications to multifunctional and multispectral implementations.Hyperspectral imaging has emerged as a powerful technique for target identification due to its capacity to capture both spectral and spatial information.The advancement of imaging spectroscopy technology has significantly enhanced reconnaissance capabilities,offering substantial advantages in camouflaged target classification and detection.However,the increasing spectral similarity between camouflaged targets and their backgrounds has significantly compromised detection performance in specific scenarios.Conventional feature extraction methods are often limited to single,shallow spectral or spatial features,failing to extract deep features and consequently yielding suboptimal classification accuracy.To address these limitations,this study proposes an innovative 3D-2D convolutional neural networks architecture incorporating depthwise separable convolution(DSC)and attention mechanisms(AM).The framework first applies dimensionality reduction to hyperspectral images and extracts preliminary spectral-spatial features.It then employs an alternating combination of 3D and 2D convolutions for deep feature extraction.For target classification,the LogSoftmax function is implemented.The integration of depthwise separable convolution not only enhances classification accuracy but also substantially reduces model parameters.Furthermore,the attention mechanisms significantly improve the network's ability to represent multidimensional features.Extensive experiments were conducted on a custom land-based hyperspectral image dataset.The results demonstrate remarkable classification accuracy:98.74%for grassland camouflage,99.13%for dead leaf camouflage and 98.94%for wild grass camouflage.Comparative analysis shows that the proposed framework is outstanding in terms of classification accuracy and robustness for camouflage target classification.
基金supported by the National Science Foundation(Grant Nos.NSF-ECCS-2127235 and EFRI-BRAID-2223495)Part of this work was conducted at the Washington Nanofabrication Facility/Molecular Analysis Facility,a National Nanotechnology Coordinated Infrastructure(NNCI)site at the University of Washington with partial support from the National Science Foundation(Grant Nos.NNCI-1542101 and NNCI-2025489).
文摘Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlinearity is challenging,and omitting the nonlinear layers in a standard CNN comes with a significant reduction in accuracy.We use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend(two fully connected layers).We obtain comparable performance with a purely electronic CNN with five convolutional layers and three fully connected layers.We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic.Using this hybrid approach,we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86 K in the hybrid compressed network enabled by the optical front end.This constitutes over 2 orders of magnitude of reduction in latency and power consumption.Furthermore,we experimentally demonstrate that the classification accuracy of the system exceeds 93%on the MNIST dataset of handwritten digits.
基金financially supported by the National Natural Science Foundation of China(Nos.62074166,62104256,62304254,and U23A20322)the Key Research and Development Plan of Hunan Province under Grant(No.2022WK2001).
文摘Ferroelectric thin film transistors(FeTFTs)have attracted great attention for in-memory computing appli-cations due to low power consumption and monolithic three-dimensional integration capability.Herein,we propose a planar integrated highly-reliable metal-ferroelectric-metal-insulator-semiconductor FeTFTs device,in which the weak erase issue is suppressed by implanting a floating gate,and the interface de-fects are reduced by simplifying the fabrication process.These lead to significant improvements in device performance,including large memory window(4.3 V),high conductance dynamic range(1400),high en-durance(10^(12)),and low variation(cycle-to-cycle:2.5%/device-to-device:3.5%).Moreover,we fabricated a 16×16 FeTFTs pseudo-crossbar array for in-memory computing and experimentally demonstrated full hardware implementation of multi-layer perceptron for the classification of four fundamental arithmetic operation symbols.This work provides a potential hardware solution for implementing a highly-efficient in-memory computing system based on highly-reliable FeTFTs array.
基金supported by the National Natural Science Foundation of China(No.62071323).
文摘In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution and complex category boundary,mixed pixels containing features from multiple classes are inevitable in HSIs.Additionally,the spectral similarity among different classes challenge for extracting distinctive spectral features essential for HSIC.To address the impact of mixed pixels and spectral similarity for HSIC,we propose a central-pixel guiding sub-pixel and sub-channel convolution network(CP-SPSC)to extract more precise spatial and spectral features.Firstly,we designed spatial attention(CP-SPA)and spectral attention(CP-SPE)informed by the central pixel to effectively reduce spectral interference of irrelevant categories in the same patch.Furthermore,we use CP-SPA to guide 2D sub-pixel convolution(SPConv2d)to capture spatial features finer than the pixel level.Meanwhile,CP-SPE is also utilized to guide 1D sub-channel con-volution(SCConv1d)in selecting more precise spectral channels.For fusing spatial and spectral information at the feature-level,the spectral feature extension transformation module(SFET)adopts mirror-padding and snake permutation to transform 1D spectral information of the center pixel into 2D spectral features.Experiments on three popular datasets demonstrate that ours out-performs several state-of-the-art methods in accuracy.
文摘Skin cancer is among the most common malignancies worldwide,but its mortality burden is largely driven by aggressive subtypes such as melanoma,with outcomes varying across regions and healthcare settings.These variations emphasize the importance of reliable diagnostic technologies that support clinicians in detecting skin malignancies with higher accuracy.Traditional diagnostic methods often rely on subjective visual assessments,which can lead to misdiagnosis.This study addresses these challenges by developing HybridFusionNet,a novel model that integrates Convolutional Neural Networks(CNN)with 1D feature extraction techniques to enhance diagnostic accuracy.Utilizing two extensive datasets,BCN20000 and HAM10000,the methodology includes data preprocessing,application of Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors(SMOTEENN)for data balancing,and optimization of feature selection using the Tree-based Pipeline Optimization Tool(TPOT).The results demonstrate significant performance improvements over traditional CNN models,achieving an accuracy of 0.9693 on the BCN20000 dataset and 0.9909 on the HAM10000 dataset.The HybridFusionNet model not only outperforms conventionalmethods but also effectively addresses class imbalance.To enhance transparency,it integrates post-hoc explanation techniques such as LIME,which highlight the features influencing predictions.These findings highlight the potential of HybridFusionNet to support real-world applications,including physician-assist systems,teledermatology,and large-scale skin cancer screening programs.By improving diagnostic efficiency and enabling access to expert-level analysis,the modelmay enhance patient outcomes and foster greater trust in artificial intelligence(AI)-assisted clinical decision-making.
文摘In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.This research proposes an integrated methodological framework,based on the principles of model-driven engineering(MDE),to transform a generic meta-model into a meta-model specifically dedicated to urban satellite image classification.We implemented this transformation using the Atlas Transformation Language(ATL),guaranteeing a smooth and consistent transition from platform-independent model(PIM)to platform-specific model(PSM),according to the principles of model-driven architecture(MDA).The application of this IDM methodology enables advanced structuring of satellite data for targeted urban planning analyses,making it possible to classify various urban zones such as built-up,cultivated,arid and water areas.The novelty of this approach lies in the automation and standardization of the classification process,which significantly reduces the need for manual intervention,and thus improves the reliability,reproducibility and efficiency of urban data analysis.By adopting this method,decision-makers and urban planners are provided with a powerful tool for systematically and consistently analyzing and interpreting satellite images,facilitating decision-making in critical areas such as urban space management,infrastructure planning and environmental preservation.