In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hi...In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.展开更多
Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning b...Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.展开更多
Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlineari...Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlinearity is challenging,and omitting the nonlinear layers in a standard CNN comes with a significant reduction in accuracy.We use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend(two fully connected layers).We obtain comparable performance with a purely electronic CNN with five convolutional layers and three fully connected layers.We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic.Using this hybrid approach,we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86 K in the hybrid compressed network enabled by the optical front end.This constitutes over 2 orders of magnitude of reduction in latency and power consumption.Furthermore,we experimentally demonstrate that the classification accuracy of the system exceeds 93%on the MNIST dataset of handwritten digits.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correla...At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.展开更多
The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natura...The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natural adversarial examples has posed significant challenges, as traditional defense methods against adversarial attacks have proven to be largely ineffective against these natural adversarial examples. This paper explores defenses against these natural adversarial examples from three perspectives: adversarial examples, model architecture, and dataset. First, it employs Class Activation Mapping (CAM) to visualize how models classify natural adversarial examples, identifying several typical attack patterns. Next, various common CNN models are analyzed to evaluate their susceptibility to these attacks, revealing that different architectures exhibit varying defensive capabilities. The study finds that as the depth of a network increases, its defenses against natural adversarial examples strengthen. Lastly, Finally, the impact of dataset class distribution on the defense capability of models is examined, focusing on two aspects: the number of classes in the training set and the number of predicted classes. This study investigates how these factors influence the model’s ability to defend against natural adversarial examples. Results indicate that reducing the number of training classes enhances the model’s defense against natural adversarial examples. Additionally, under a fixed number of training classes, some CNN models show an optimal range of predicted classes for achieving the best defense performance against these adversarial examples.展开更多
In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution an...In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution and complex category boundary,mixed pixels containing features from multiple classes are inevitable in HSIs.Additionally,the spectral similarity among different classes challenge for extracting distinctive spectral features essential for HSIC.To address the impact of mixed pixels and spectral similarity for HSIC,we propose a central-pixel guiding sub-pixel and sub-channel convolution network(CP-SPSC)to extract more precise spatial and spectral features.Firstly,we designed spatial attention(CP-SPA)and spectral attention(CP-SPE)informed by the central pixel to effectively reduce spectral interference of irrelevant categories in the same patch.Furthermore,we use CP-SPA to guide 2D sub-pixel convolution(SPConv2d)to capture spatial features finer than the pixel level.Meanwhile,CP-SPE is also utilized to guide 1D sub-channel con-volution(SCConv1d)in selecting more precise spectral channels.For fusing spatial and spectral information at the feature-level,the spectral feature extension transformation module(SFET)adopts mirror-padding and snake permutation to transform 1D spectral information of the center pixel into 2D spectral features.Experiments on three popular datasets demonstrate that ours out-performs several state-of-the-art methods in accuracy.展开更多
In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.Thi...In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.This research proposes an integrated methodological framework,based on the principles of model-driven engineering(MDE),to transform a generic meta-model into a meta-model specifically dedicated to urban satellite image classification.We implemented this transformation using the Atlas Transformation Language(ATL),guaranteeing a smooth and consistent transition from platform-independent model(PIM)to platform-specific model(PSM),according to the principles of model-driven architecture(MDA).The application of this IDM methodology enables advanced structuring of satellite data for targeted urban planning analyses,making it possible to classify various urban zones such as built-up,cultivated,arid and water areas.The novelty of this approach lies in the automation and standardization of the classification process,which significantly reduces the need for manual intervention,and thus improves the reliability,reproducibility and efficiency of urban data analysis.By adopting this method,decision-makers and urban planners are provided with a powerful tool for systematically and consistently analyzing and interpreting satellite images,facilitating decision-making in critical areas such as urban space management,infrastructure planning and environmental preservation.展开更多
Medical image classification is crucial in disease diagnosis,treatment planning,and clinical decisionmaking.We introduced a novel medical image classification approach that integrates Bayesian Random Semantic Data Aug...Medical image classification is crucial in disease diagnosis,treatment planning,and clinical decisionmaking.We introduced a novel medical image classification approach that integrates Bayesian Random Semantic Data Augmentation(BSDA)with a Vision Mamba-based model for medical image classification(MedMamba),enhanced by residual connection blocks,we named the model BSDA-Mamba.BSDA augments medical image data semantically,enhancing the model’s generalization ability and classification performance.MedMamba,a deep learning-based state space model,excels in capturing long-range dependencies in medical images.By incorporating residual connections,BSDA-Mamba further improves feature extraction capabilities.Through comprehensive experiments on eight medical image datasets,we demonstrate that BSDA-Mamba outperforms existing models in accuracy,area under the curve,and F1-score.Our results highlight BSDA-Mamba’s potential as a reliable tool for medical image analysis,particularly in handling diverse imaging modalities from X-rays to MRI.The open-sourcing of our model’s code and datasets,will facilitate the reproduction and extension of our work.展开更多
Hyperspectral image(HSI)classification has been one of themost important tasks in the remote sensing community over the last few decades.Due to the presence of highly correlated bands and limited training samples in H...Hyperspectral image(HSI)classification has been one of themost important tasks in the remote sensing community over the last few decades.Due to the presence of highly correlated bands and limited training samples in HSI,discriminative feature extraction was challenging for traditional machine learning methods.Recently,deep learning based methods have been recognized as powerful feature extraction tool and have drawn a significant amount of attention in HSI classification.Among various deep learning models,convolutional neural networks(CNNs)have shown huge success and offered great potential to yield high performance in HSI classification.Motivated by this successful performance,this paper presents a systematic review of different CNN architectures for HSI classification and provides some future guidelines.To accomplish this,our study has taken a few important steps.First,we have focused on different CNN architectures,which are able to extract spectral,spatial,and joint spectral-spatial features.Then,many publications related to CNN based HSI classifications have been reviewed systematically.Further,a detailed comparative performance analysis has been presented between four CNN models namely 1D CNN,2D CNN,3D CNN,and feature fusion based CNN(FFCNN).Four benchmark HSI datasets have been used in our experiment for evaluating the performance.Finally,we concluded the paper with challenges on CNN based HSI classification and future guidelines that may help the researchers to work on HSI classification using CNN.展开更多
The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation infor...The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation information hidden in the data,the classification result will be improved significantly.To this end,in this paper,a novel weighted supervised spare coding method is proposed to address the image classification problem.The proposed method firstly explores the structural information sufficiently hidden in the data based on the low rank representation.And then,it introduced the extracted structural information to a novel weighted sparse representation model to code the samples in a supervised way.Experimental results show that the proposed method is superiority to many conventional image classification methods.展开更多
The evolving“Industry 4.0”domain encompasses a collection of future industrial developments with cyber-physical systems(CPS),Internet of things(IoT),big data,cloud computing,etc.Besides,the industrial Internet of th...The evolving“Industry 4.0”domain encompasses a collection of future industrial developments with cyber-physical systems(CPS),Internet of things(IoT),big data,cloud computing,etc.Besides,the industrial Internet of things(IIoT)directs data from systems for monitoring and controlling the physical world to the data processing system.A major novelty of the IIoT is the unmanned aerial vehicles(UAVs),which are treated as an efficient remote sensing technique to gather data from large regions.UAVs are commonly employed in the industrial sector to solve several issues and help decision making.But the strict regulations leading to data privacy possibly hinder data sharing across autonomous UAVs.Federated learning(FL)becomes a recent advancement of machine learning(ML)which aims to protect user data.In this aspect,this study designs federated learning with blockchain assisted image classification model for clustered UAV networks(FLBIC-CUAV)on IIoT environment.The proposed FLBIC-CUAV technique involves three major processes namely clustering,blockchain enabled secure communication and FL based image classification.For UAV cluster construction process,beetle swarm optimization(BSO)algorithm with three input parameters is designed to cluster the UAVs for effective communication.In addition,blockchain enabled secure data transmission process take place to transmit the data from UAVs to cloud servers.Finally,the cloud server uses an FL with Residual Network model to carry out the image classification process.A wide range of simulation analyses takes place for ensuring the betterment of the FLBIC-CUAV approach.The experimental outcomes portrayed the betterment of the FLBIC-CUAV approach over the recent state of art methods.展开更多
Indian agriculture is striving to achieve sustainable intensification,the system aiming to increase agricultural yield per unit area without harming natural resources and the ecosystem.Modern farming employs technolog...Indian agriculture is striving to achieve sustainable intensification,the system aiming to increase agricultural yield per unit area without harming natural resources and the ecosystem.Modern farming employs technology to improve productivity.Early and accurate analysis and diagnosis of plant disease is very helpful in reducing plant diseases and improving plant health and food crop productivity.Plant disease experts are not available in remote areas thus there is a requirement of automatic low-cost,approachable and reliable solutions to identify the plant diseases without the laboratory inspection and expert’s opinion.Deep learning-based computer vision techniques like Convolutional Neural Network(CNN)and traditional machine learning-based image classification approaches are being applied to identify plant diseases.In this paper,the CNN model is proposed for the classification of rice and potato plant leaf diseases.Rice leaves are diagnosed with bacterial blight,blast,brown spot and tungro diseases.Potato leaf images are classified into three classes:healthy leaves,early blight and late blight diseases.Rice leaf dataset with 5932 images and 1500 potato leaf images are used in the study.The proposed CNN model was able to learn hidden patterns from the raw images and classify rice images with 99.58%accuracy and potato leaves with 97.66%accuracy.The results demonstrate that the proposed CNN model performed better when compared with other machine learning image classifiers such as Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Decision Tree and Random Forest.展开更多
With limited number of labeled samples,hyperspectral image(HSI)classification is a difficult Problem in current research.The graph neural network(GNN)has emerged as an approach to semi-supervised classification,and th...With limited number of labeled samples,hyperspectral image(HSI)classification is a difficult Problem in current research.The graph neural network(GNN)has emerged as an approach to semi-supervised classification,and the application of GNN to hyperspectral images has attracted much attention.However,in the existing GNN-based methods a single graph neural network or graph filter is mainly used to extract HSI features,which does not take full advantage of various graph neural networks(graph filters).Moreover,the traditional GNNs have the problem of oversmoothing.To alleviate these shortcomings,we introduce a deep hybrid multi-graph neural network(DHMG),where two different graph filters,i.e.,the spectral filter and the autoregressive moving average(ARMA)filter,are utilized in two branches.The former can well extract the spectral features of the nodes,and the latter has a good suppression effect on graph noise.The network realizes information interaction between the two branches and takes good advantage of different graph filters.In addition,to address the problem of oversmoothing,a dense network is proposed,where the local graph features are preserved.The dense structure satisfies the needs of different classification targets presenting different features.Finally,we introduce a GraphSAGEbased network to refine the graph features produced by the deep hybrid network.Extensive experiments on three public HSI datasets strongly demonstrate that the DHMG dramatically outperforms the state-ofthe-art models.展开更多
With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high effi...With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high efficiency and low consumption.As an important tool for satellite remote sensing image processing,remote sensing image classification has become a hot topic.According to the natural texture characteristics of remote sensing images,this paper combines different texture features with the Extreme Learning Machine,and proposes a new remote sensing image classification algorithm.The experimental tests are carried out through the standard test dataset SAT-4 and SAT-6.Our results show that the proposed method is a simpler and more efficient remote sensing image classification algorithm.It also achieves 99.434%recognition accuracy on SAT-4,which is 1.5%higher than the 97.95%accuracy achieved by DeepSat.At the same time,the recognition accuracy of SAT-6 reaches 99.5728%,which is 5.6%higher than DeepSat’s 93.9%.展开更多
Fruit classification is found to be one of the rising fields in computer and machine vision.Many deep learning-based procedures worked out so far to classify images may have some ill-posed issues.The performance of th...Fruit classification is found to be one of the rising fields in computer and machine vision.Many deep learning-based procedures worked out so far to classify images may have some ill-posed issues.The performance of the classification scheme depends on the range of captured images,the volume of features,types of characters,choice of features from extracted features,and type of classifiers used.This paper aims to propose a novel deep learning approach consisting of Convolution Neural Network(CNN),Recurrent Neural Network(RNN),and Long Short-TermMemory(LSTM)application to classify the fruit images.Classification accuracy depends on the extracted and selected optimal features.Deep learning applications CNN,RNN,and LSTM were collectively involved to classify the fruits.CNN is used to extract the image features.RNN is used to select the extracted optimal features and LSTM is used to classify the fruits based on extracted and selected images features by CNN and RNN.Empirical study shows the supremacy of proposed over existing Support Vector Machine(SVM),Feed-forwardNeural Network(FFNN),and Adaptive Neuro-Fuzzy Inference System(ANFIS)competitive techniques for fruit images classification.The accuracy rate of the proposed approach is quite better than the SVM,FFNN,and ANFIS schemes.It has been concluded that the proposed technique outperforms existing schemes.展开更多
Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this...Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.展开更多
This paper presents a new kind of back propagation neural network(BPNN)based on rough sets,called rough back propagation neural network(RBPNN).The architecture and training method of RBPNN are presented and the survey...This paper presents a new kind of back propagation neural network(BPNN)based on rough sets,called rough back propagation neural network(RBPNN).The architecture and training method of RBPNN are presented and the survey and analysis of RBPNN for the classification of remote sensing multi_spectral image is discussed.The successful application of RBPNN to a land cover classification illustrates the simple computation and high accuracy of the new neural network and the flexibility and practicality of this new approach.展开更多
Inclusion of textures in image classification has been shown beneficial.This paper studies an efficient use of semivariogram features for object-based high-resolution image classification.First,an input image is divid...Inclusion of textures in image classification has been shown beneficial.This paper studies an efficient use of semivariogram features for object-based high-resolution image classification.First,an input image is divided into segments,for each of which a semivariogram is then calculated.Second,candidate features are extracted as a number of key locations of the semivariogram functions.Then we use an improved Relief algorithm and the principal component analysis to select independent and significant features.Then the selected prominent semivariogram features and the conventional spectral features are combined to constitute a feature vector for a support vector machine classifier.The effect of such selected semivariogram features is compared with those of the gray-level co-occurrence matrix(GLCM)features and window-based semivariogram texture features(STFs).Tests with aerial and satellite images show that such selected semivariogram features are of a more beneficial supplement to spectral features.The described method in this paper yields a higher classification accuracy than the combination of spectral and GLCM features or STFs.展开更多
The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal...The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal property in nature,e.g.,the development from benign to malignant tumor,CE loss cannot take into account such ordinal information to allow for better generalization.To improve model generalization with ordinal information,we propose a novel meta ordinal regression forest(MORF)method for medical image classification with ordinal labels,which learns the ordinal relationship through the combination of convolutional neural network and differential forest in a meta-learning framework.The merits of the proposed MORF come from the following two components:A tree-wise weighting net(TWW-Net)and a grouped feature selection(GFS)module.First,the TWW-Net assigns each tree in the forest with a specific weight that is mapped from the classification loss of the corresponding tree.Hence,all the trees possess varying weights,which is helpful for alleviating the tree-wise prediction variance.Second,the GFS module enables a dynamic forest rather than a fixed one that was previously used,allowing for random feature perturbation.During training,we alternatively optimize the parameters of the CNN backbone and TWW-Net in the meta-learning framework through calculating the Hessian matrix.Experimental results on two medical image classification datasets with ordinal labels,i.e.,LIDC-IDRI and Breast Ultrasound datasets,demonstrate the superior performances of our MORF method over existing state-of-the-art methods.展开更多
基金Supported by the National Natural Science Foundation of China(61601176)。
文摘In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.
基金funded by Innovation and Development Special Project of China Meteorological Administration(CXFZ2022J038,CXFZ2024J035)Sichuan Science and Technology Program(No.2023YFQ0072)+1 种基金Key Laboratory of Smart Earth(No.KF2023YB03-07)Automatic Software Generation and Intelligent Service Key Laboratory of Sichuan Province(CUIT-SAG202210).
文摘Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.
基金supported by the National Science Foundation(Grant Nos.NSF-ECCS-2127235 and EFRI-BRAID-2223495)Part of this work was conducted at the Washington Nanofabrication Facility/Molecular Analysis Facility,a National Nanotechnology Coordinated Infrastructure(NNCI)site at the University of Washington with partial support from the National Science Foundation(Grant Nos.NNCI-1542101 and NNCI-2025489).
文摘Optical and hybrid convolutional neural networks(CNNs)recently have become of increasing interest to achieve low-latency,low-power image classification,and computer-vision tasks.However,implementing optical nonlinearity is challenging,and omitting the nonlinear layers in a standard CNN comes with a significant reduction in accuracy.We use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend(two fully connected layers).We obtain comparable performance with a purely electronic CNN with five convolutional layers and three fully connected layers.We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic.Using this hybrid approach,we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86 K in the hybrid compressed network enabled by the optical front end.This constitutes over 2 orders of magnitude of reduction in latency and power consumption.Furthermore,we experimentally demonstrate that the classification accuracy of the system exceeds 93%on the MNIST dataset of handwritten digits.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金the National Natural Science Foundation of China(Nos.62167005 and 61966018)the Key Research Projects of Jiangxi Provincial Department of Education(No.GJJ200302)。
文摘At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.
文摘The emergence of adversarial examples has revealed the inadequacies in the robustness of image classification models based on Convolutional Neural Networks (CNNs). Particularly in recent years, the discovery of natural adversarial examples has posed significant challenges, as traditional defense methods against adversarial attacks have proven to be largely ineffective against these natural adversarial examples. This paper explores defenses against these natural adversarial examples from three perspectives: adversarial examples, model architecture, and dataset. First, it employs Class Activation Mapping (CAM) to visualize how models classify natural adversarial examples, identifying several typical attack patterns. Next, various common CNN models are analyzed to evaluate their susceptibility to these attacks, revealing that different architectures exhibit varying defensive capabilities. The study finds that as the depth of a network increases, its defenses against natural adversarial examples strengthen. Lastly, Finally, the impact of dataset class distribution on the defense capability of models is examined, focusing on two aspects: the number of classes in the training set and the number of predicted classes. This study investigates how these factors influence the model’s ability to defend against natural adversarial examples. Results indicate that reducing the number of training classes enhances the model’s defense against natural adversarial examples. Additionally, under a fixed number of training classes, some CNN models show an optimal range of predicted classes for achieving the best defense performance against these adversarial examples.
基金supported by the National Natural Science Foundation of China(No.62071323).
文摘In hyperspectral image classification(HSIC),accurately extracting spatial and spectral information from hyperspectral images(HSI)is crucial for achieving precise classification.However,due to low spatial resolution and complex category boundary,mixed pixels containing features from multiple classes are inevitable in HSIs.Additionally,the spectral similarity among different classes challenge for extracting distinctive spectral features essential for HSIC.To address the impact of mixed pixels and spectral similarity for HSIC,we propose a central-pixel guiding sub-pixel and sub-channel convolution network(CP-SPSC)to extract more precise spatial and spectral features.Firstly,we designed spatial attention(CP-SPA)and spectral attention(CP-SPE)informed by the central pixel to effectively reduce spectral interference of irrelevant categories in the same patch.Furthermore,we use CP-SPA to guide 2D sub-pixel convolution(SPConv2d)to capture spatial features finer than the pixel level.Meanwhile,CP-SPE is also utilized to guide 1D sub-channel con-volution(SCConv1d)in selecting more precise spectral channels.For fusing spatial and spectral information at the feature-level,the spectral feature extension transformation module(SFET)adopts mirror-padding and snake permutation to transform 1D spectral information of the center pixel into 2D spectral features.Experiments on three popular datasets demonstrate that ours out-performs several state-of-the-art methods in accuracy.
文摘In a context where urban satellite image processing technologies are undergoing rapid evolution,this article presents an innovative and rigorous approach to satellite image classification applied to urban planning.This research proposes an integrated methodological framework,based on the principles of model-driven engineering(MDE),to transform a generic meta-model into a meta-model specifically dedicated to urban satellite image classification.We implemented this transformation using the Atlas Transformation Language(ATL),guaranteeing a smooth and consistent transition from platform-independent model(PIM)to platform-specific model(PSM),according to the principles of model-driven architecture(MDA).The application of this IDM methodology enables advanced structuring of satellite data for targeted urban planning analyses,making it possible to classify various urban zones such as built-up,cultivated,arid and water areas.The novelty of this approach lies in the automation and standardization of the classification process,which significantly reduces the need for manual intervention,and thus improves the reliability,reproducibility and efficiency of urban data analysis.By adopting this method,decision-makers and urban planners are provided with a powerful tool for systematically and consistently analyzing and interpreting satellite images,facilitating decision-making in critical areas such as urban space management,infrastructure planning and environmental preservation.
文摘Medical image classification is crucial in disease diagnosis,treatment planning,and clinical decisionmaking.We introduced a novel medical image classification approach that integrates Bayesian Random Semantic Data Augmentation(BSDA)with a Vision Mamba-based model for medical image classification(MedMamba),enhanced by residual connection blocks,we named the model BSDA-Mamba.BSDA augments medical image data semantically,enhancing the model’s generalization ability and classification performance.MedMamba,a deep learning-based state space model,excels in capturing long-range dependencies in medical images.By incorporating residual connections,BSDA-Mamba further improves feature extraction capabilities.Through comprehensive experiments on eight medical image datasets,we demonstrate that BSDA-Mamba outperforms existing models in accuracy,area under the curve,and F1-score.Our results highlight BSDA-Mamba’s potential as a reliable tool for medical image analysis,particularly in handling diverse imaging modalities from X-rays to MRI.The open-sourcing of our model’s code and datasets,will facilitate the reproduction and extension of our work.
文摘Hyperspectral image(HSI)classification has been one of themost important tasks in the remote sensing community over the last few decades.Due to the presence of highly correlated bands and limited training samples in HSI,discriminative feature extraction was challenging for traditional machine learning methods.Recently,deep learning based methods have been recognized as powerful feature extraction tool and have drawn a significant amount of attention in HSI classification.Among various deep learning models,convolutional neural networks(CNNs)have shown huge success and offered great potential to yield high performance in HSI classification.Motivated by this successful performance,this paper presents a systematic review of different CNN architectures for HSI classification and provides some future guidelines.To accomplish this,our study has taken a few important steps.First,we have focused on different CNN architectures,which are able to extract spectral,spatial,and joint spectral-spatial features.Then,many publications related to CNN based HSI classifications have been reviewed systematically.Further,a detailed comparative performance analysis has been presented between four CNN models namely 1D CNN,2D CNN,3D CNN,and feature fusion based CNN(FFCNN).Four benchmark HSI datasets have been used in our experiment for evaluating the performance.Finally,we concluded the paper with challenges on CNN based HSI classification and future guidelines that may help the researchers to work on HSI classification using CNN.
基金This research is funded by the National Natural Science Foundation of China(61771154).
文摘The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation information hidden in the data,the classification result will be improved significantly.To this end,in this paper,a novel weighted supervised spare coding method is proposed to address the image classification problem.The proposed method firstly explores the structural information sufficiently hidden in the data based on the low rank representation.And then,it introduced the extracted structural information to a novel weighted sparse representation model to code the samples in a supervised way.Experimental results show that the proposed method is superiority to many conventional image classification methods.
基金We deeply acknowledge Taif University for supporting this research through Taif University Researchers Supporting Project Number(TURSP-2020/328),Taif University,Taif,Saudi Arabia.
文摘The evolving“Industry 4.0”domain encompasses a collection of future industrial developments with cyber-physical systems(CPS),Internet of things(IoT),big data,cloud computing,etc.Besides,the industrial Internet of things(IIoT)directs data from systems for monitoring and controlling the physical world to the data processing system.A major novelty of the IIoT is the unmanned aerial vehicles(UAVs),which are treated as an efficient remote sensing technique to gather data from large regions.UAVs are commonly employed in the industrial sector to solve several issues and help decision making.But the strict regulations leading to data privacy possibly hinder data sharing across autonomous UAVs.Federated learning(FL)becomes a recent advancement of machine learning(ML)which aims to protect user data.In this aspect,this study designs federated learning with blockchain assisted image classification model for clustered UAV networks(FLBIC-CUAV)on IIoT environment.The proposed FLBIC-CUAV technique involves three major processes namely clustering,blockchain enabled secure communication and FL based image classification.For UAV cluster construction process,beetle swarm optimization(BSO)algorithm with three input parameters is designed to cluster the UAVs for effective communication.In addition,blockchain enabled secure data transmission process take place to transmit the data from UAVs to cloud servers.Finally,the cloud server uses an FL with Residual Network model to carry out the image classification process.A wide range of simulation analyses takes place for ensuring the betterment of the FLBIC-CUAV approach.The experimental outcomes portrayed the betterment of the FLBIC-CUAV approach over the recent state of art methods.
基金This research supported by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia under Grant Number KAU 2020/251.
文摘Indian agriculture is striving to achieve sustainable intensification,the system aiming to increase agricultural yield per unit area without harming natural resources and the ecosystem.Modern farming employs technology to improve productivity.Early and accurate analysis and diagnosis of plant disease is very helpful in reducing plant diseases and improving plant health and food crop productivity.Plant disease experts are not available in remote areas thus there is a requirement of automatic low-cost,approachable and reliable solutions to identify the plant diseases without the laboratory inspection and expert’s opinion.Deep learning-based computer vision techniques like Convolutional Neural Network(CNN)and traditional machine learning-based image classification approaches are being applied to identify plant diseases.In this paper,the CNN model is proposed for the classification of rice and potato plant leaf diseases.Rice leaves are diagnosed with bacterial blight,blast,brown spot and tungro diseases.Potato leaf images are classified into three classes:healthy leaves,early blight and late blight diseases.Rice leaf dataset with 5932 images and 1500 potato leaf images are used in the study.The proposed CNN model was able to learn hidden patterns from the raw images and classify rice images with 99.58%accuracy and potato leaves with 97.66%accuracy.The results demonstrate that the proposed CNN model performed better when compared with other machine learning image classifiers such as Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Decision Tree and Random Forest.
文摘With limited number of labeled samples,hyperspectral image(HSI)classification is a difficult Problem in current research.The graph neural network(GNN)has emerged as an approach to semi-supervised classification,and the application of GNN to hyperspectral images has attracted much attention.However,in the existing GNN-based methods a single graph neural network or graph filter is mainly used to extract HSI features,which does not take full advantage of various graph neural networks(graph filters).Moreover,the traditional GNNs have the problem of oversmoothing.To alleviate these shortcomings,we introduce a deep hybrid multi-graph neural network(DHMG),where two different graph filters,i.e.,the spectral filter and the autoregressive moving average(ARMA)filter,are utilized in two branches.The former can well extract the spectral features of the nodes,and the latter has a good suppression effect on graph noise.The network realizes information interaction between the two branches and takes good advantage of different graph filters.In addition,to address the problem of oversmoothing,a dense network is proposed,where the local graph features are preserved.The dense structure satisfies the needs of different classification targets presenting different features.Finally,we introduce a GraphSAGEbased network to refine the graph features produced by the deep hybrid network.Extensive experiments on three public HSI datasets strongly demonstrate that the DHMG dramatically outperforms the state-ofthe-art models.
基金This work was supported in part by national science foundation project of P.R.China under Grant No.61701554State Language Commission Key Project(ZDl135-39)+1 种基金First class courses(Digital Image Processing:KC2066)MUC 111 Project,Ministry of Education Collaborative Education Project(201901056009,201901160059,201901238038).
文摘With the development of satellite technology,the satellite imagery of the earth’s surface and the whole surface makes it possible to survey surface resources and master the dynamic changes of the earth with high efficiency and low consumption.As an important tool for satellite remote sensing image processing,remote sensing image classification has become a hot topic.According to the natural texture characteristics of remote sensing images,this paper combines different texture features with the Extreme Learning Machine,and proposes a new remote sensing image classification algorithm.The experimental tests are carried out through the standard test dataset SAT-4 and SAT-6.Our results show that the proposed method is a simpler and more efficient remote sensing image classification algorithm.It also achieves 99.434%recognition accuracy on SAT-4,which is 1.5%higher than the 97.95%accuracy achieved by DeepSat.At the same time,the recognition accuracy of SAT-6 reaches 99.5728%,which is 5.6%higher than DeepSat’s 93.9%.
基金This research is funded by Taif University,TURSP-2020/150.
文摘Fruit classification is found to be one of the rising fields in computer and machine vision.Many deep learning-based procedures worked out so far to classify images may have some ill-posed issues.The performance of the classification scheme depends on the range of captured images,the volume of features,types of characters,choice of features from extracted features,and type of classifiers used.This paper aims to propose a novel deep learning approach consisting of Convolution Neural Network(CNN),Recurrent Neural Network(RNN),and Long Short-TermMemory(LSTM)application to classify the fruit images.Classification accuracy depends on the extracted and selected optimal features.Deep learning applications CNN,RNN,and LSTM were collectively involved to classify the fruits.CNN is used to extract the image features.RNN is used to select the extracted optimal features and LSTM is used to classify the fruits based on extracted and selected images features by CNN and RNN.Empirical study shows the supremacy of proposed over existing Support Vector Machine(SVM),Feed-forwardNeural Network(FFNN),and Adaptive Neuro-Fuzzy Inference System(ANFIS)competitive techniques for fruit images classification.The accuracy rate of the proposed approach is quite better than the SVM,FFNN,and ANFIS schemes.It has been concluded that the proposed technique outperforms existing schemes.
基金Supported by Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61321002)Projects of Major International(Regional)Jiont Research Program NSFC(61120106010)+1 种基金Beijing Education Committee Cooperation Building Foundation ProjectProgram for Changjiang Scholars and Innovative Research Team in University(IRT1208)
文摘Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.
基金Projectsupported by the fund of the Scientific and Technological Development of Surveying and Mapping from the State Bureau of Surveying and Mapping(No.2001_02_04).
文摘This paper presents a new kind of back propagation neural network(BPNN)based on rough sets,called rough back propagation neural network(RBPNN).The architecture and training method of RBPNN are presented and the survey and analysis of RBPNN for the classification of remote sensing multi_spectral image is discussed.The successful application of RBPNN to a land cover classification illustrates the simple computation and high accuracy of the new neural network and the flexibility and practicality of this new approach.
基金This work was supported by the National Natural Science Foundation of China[grant number 41101410]the Comprehensive Transportation Applications of High-resolution Remote Sensing program[grant number 07-Y30B10-9001-14/16]+1 种基金the Key Laboratory of Surveying Mapping and Geoinformation in Geographical Condition Monitoring[grant number 2014NGCM]the Science and Technology Plan of Sichuan Bureau of Surveying,Mapping and Geoinformation,China[grant number J2014ZC02].
文摘Inclusion of textures in image classification has been shown beneficial.This paper studies an efficient use of semivariogram features for object-based high-resolution image classification.First,an input image is divided into segments,for each of which a semivariogram is then calculated.Second,candidate features are extracted as a number of key locations of the semivariogram functions.Then we use an improved Relief algorithm and the principal component analysis to select independent and significant features.Then the selected prominent semivariogram features and the conventional spectral features are combined to constitute a feature vector for a support vector machine classifier.The effect of such selected semivariogram features is compared with those of the gray-level co-occurrence matrix(GLCM)features and window-based semivariogram texture features(STFs).Tests with aerial and satellite images show that such selected semivariogram features are of a more beneficial supplement to spectral features.The described method in this paper yields a higher classification accuracy than the combination of spectral and GLCM features or STFs.
基金This work was supported in part by the Natural Science Foundation of Shanghai(21ZR1403600)the National Natural Science Foundation of China(62176059)+3 种基金Shanghai Municipal Science and Technology Major Project(2018SHZDZX01)Zhang Jiang Laboratory,Shanghai Sailing Program(21YF1402800)Shanghai Municipal of Science and Technology Project(20JC1419500)Shanghai Center for Brain Science and Brain-inspired Technology.
文摘The performance of medical image classification has been enhanced by deep convolutional neural networks(CNNs),which are typically trained with cross-entropy(CE)loss.However,when the label presents an intrinsic ordinal property in nature,e.g.,the development from benign to malignant tumor,CE loss cannot take into account such ordinal information to allow for better generalization.To improve model generalization with ordinal information,we propose a novel meta ordinal regression forest(MORF)method for medical image classification with ordinal labels,which learns the ordinal relationship through the combination of convolutional neural network and differential forest in a meta-learning framework.The merits of the proposed MORF come from the following two components:A tree-wise weighting net(TWW-Net)and a grouped feature selection(GFS)module.First,the TWW-Net assigns each tree in the forest with a specific weight that is mapped from the classification loss of the corresponding tree.Hence,all the trees possess varying weights,which is helpful for alleviating the tree-wise prediction variance.Second,the GFS module enables a dynamic forest rather than a fixed one that was previously used,allowing for random feature perturbation.During training,we alternatively optimize the parameters of the CNN backbone and TWW-Net in the meta-learning framework through calculating the Hessian matrix.Experimental results on two medical image classification datasets with ordinal labels,i.e.,LIDC-IDRI and Breast Ultrasound datasets,demonstrate the superior performances of our MORF method over existing state-of-the-art methods.