针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL...针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL)的深度学习模型,ViTL模型集成了双路Vision-Transformer特征提取、时空特征融合和长短期记忆递归神经网络(LSTM)时序分类等3个关键模块,双路Vision-Transformer特征提取模块用于捕获图像的时空特征相关性,一路提取空间分类特征,一路提取时间变化特征;时空特征融合模块用于将多时特征信息进行交叉融合;LSTM时序分类模块捕捉多时序的依赖关系并进行输出分类。综合利用基于多时序卫星影像的遥感技术理论和方法,对黑龙江省齐齐哈尔市讷河市作物信息进行提取,研究结果表明,ViTL模型表现出色,其总体准确率(Overall Accuracy,OA)、平均交并比(Mean Intersection over Union,MIoU)和F1分数分别达到0.8676、0.6987和0.8175,与其他广泛使用的深度学习方法相比,包括三维卷积神经网络(3-D CNN)、二维卷积神经网络(2-D CNN)和长短期记忆递归神经网络(LSTM),ViTL模型的F1分数提高了9%~12%,显示出显著的优越性。ViTL模型克服了面对多时序遥感影像的农作物分类任务中的时间和空间信息特征采样不足问题,为准确、高效地农作物分类提供了新思路。展开更多
Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly di...Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.展开更多
Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods ar...Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods are inherently slow,labour-intensive,and prone to inaccuracies,especiallywhen distinguishing between species exhibiting visual similarities or high intra-species variability.To address these limitations and to overcome the constraints of imageonly approaches,we introduce a novel Artificial Intelligence-driven framework.This approach integrates robust Vision Transformer(ViT)models for advanced visual analysis with a multi-modal data fusion strategy,incorporating contextual metadata such as precise environmental conditions,geographic location,and phenological traits.This combination of visual and ecological cues significantly enhances classification accuracy and robustness,proving especially vital in complex,heterogeneous real-world environments.The proposedmodel achieves an impressive 97.27%of test accuracy,andMean Reciprocal Rank(MRR)of 0.9842 that demonstrates strong generalization capabilities.Furthermore,efficient utilization of high-performance GPU resources(RTX 3090,18 GB memory)ensures scalable processing of highdimensional data.Comparative analysis consistently confirms that ourmetadata fusion approach substantially improves classification performance,particularly formorphologically similar species,and through principled self-supervised and transfer learning from ImageNet,the model adapts efficiently to new species,ensuring enhanced generalization.This comprehensive approach holds profound practical implications for precise conservation initiatives,rigorous ecological monitoring,and advanced agricultural management.展开更多
Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at...Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at an early stage is crucial to prevent their spread,which can lead to substantial losses.The development of deep learning models for detecting crop diseases is an active area of research in smart agriculture.This study focuses on mango plant diseases and employs the ConvNeXt and Vision Transformer(ViT)architectures.Two datasets were used.The first,MangoLeafBD,contains data for mango leaf diseases such as anthracnose,bacterial canker,gall midge,and powdery mildew.The second,SenMangoFruitDDS,includes data for mango fruit diseases such as Alternaria,Anthracnose,Black Mould Rot,Healthy,and Stem and Rot.Both datasets were obtained from publicly available sources.The proposed model achieved an accuracy of 99.87%on the MangoLeafBD dataset and 98.40%on the MangoFruitDDS dataset.The results demonstrate that ConvNeXt and ViT models can effectively diagnose mango diseases,enabling farmers to identify these conditions more efficiently.The system contributes to increased mango production and minimizes economic losses by reducing the time and effort needed for manual diagnostics.Additionally,the proposed system is integrated into a mobile application that utilizes the model as a backend to detect mango diseases instantly.展开更多
The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between ne...The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively.展开更多
Age-related Macular Degeneration(AMD)and Diabetic Macular Edema(DME)are two com-mon retinal diseases for elder people that may ultimately cause irreversible blindness.Timely and accurate diagnosis is essential for the...Age-related Macular Degeneration(AMD)and Diabetic Macular Edema(DME)are two com-mon retinal diseases for elder people that may ultimately cause irreversible blindness.Timely and accurate diagnosis is essential for the treatment of these diseases.In recent years,computer-aided diagnosis(CAD)has been deeply investigated and effectively used for rapid and early diagnosis.In this paper,we proposed a method of CAD using vision transformer to analyze optical co-herence tomography(OCT)images and to automatically discriminate AMD,DME,and normal eyes.A classification accuracy of 99.69%was achieved.After the model pruning,the recognition time reached 0.010 s and the classification accuracy did not drop.Compared with the Con-volutional Neural Network(CNN)image classification models(VGG16,Resnet50,Densenet121,and EfficientNet),vision transformer after pruning exhibited better recognition ability.Results show that vision transformer is an improved alternative to diagnose retinal diseases more accurately.展开更多
Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by ut...Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.展开更多
文摘针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL)的深度学习模型,ViTL模型集成了双路Vision-Transformer特征提取、时空特征融合和长短期记忆递归神经网络(LSTM)时序分类等3个关键模块,双路Vision-Transformer特征提取模块用于捕获图像的时空特征相关性,一路提取空间分类特征,一路提取时间变化特征;时空特征融合模块用于将多时特征信息进行交叉融合;LSTM时序分类模块捕捉多时序的依赖关系并进行输出分类。综合利用基于多时序卫星影像的遥感技术理论和方法,对黑龙江省齐齐哈尔市讷河市作物信息进行提取,研究结果表明,ViTL模型表现出色,其总体准确率(Overall Accuracy,OA)、平均交并比(Mean Intersection over Union,MIoU)和F1分数分别达到0.8676、0.6987和0.8175,与其他广泛使用的深度学习方法相比,包括三维卷积神经网络(3-D CNN)、二维卷积神经网络(2-D CNN)和长短期记忆递归神经网络(LSTM),ViTL模型的F1分数提高了9%~12%,显示出显著的优越性。ViTL模型克服了面对多时序遥感影像的农作物分类任务中的时间和空间信息特征采样不足问题,为准确、高效地农作物分类提供了新思路。
基金Researchers Supporting Project Number(RSPD2024R 553),King Saud University,Riyadh,Saudi Arabia.
文摘Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed.
文摘Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods are inherently slow,labour-intensive,and prone to inaccuracies,especiallywhen distinguishing between species exhibiting visual similarities or high intra-species variability.To address these limitations and to overcome the constraints of imageonly approaches,we introduce a novel Artificial Intelligence-driven framework.This approach integrates robust Vision Transformer(ViT)models for advanced visual analysis with a multi-modal data fusion strategy,incorporating contextual metadata such as precise environmental conditions,geographic location,and phenological traits.This combination of visual and ecological cues significantly enhances classification accuracy and robustness,proving especially vital in complex,heterogeneous real-world environments.The proposedmodel achieves an impressive 97.27%of test accuracy,andMean Reciprocal Rank(MRR)of 0.9842 that demonstrates strong generalization capabilities.Furthermore,efficient utilization of high-performance GPU resources(RTX 3090,18 GB memory)ensures scalable processing of highdimensional data.Comparative analysis consistently confirms that ourmetadata fusion approach substantially improves classification performance,particularly formorphologically similar species,and through principled self-supervised and transfer learning from ImageNet,the model adapts efficiently to new species,ensuring enhanced generalization.This comprehensive approach holds profound practical implications for precise conservation initiatives,rigorous ecological monitoring,and advanced agricultural management.
基金funded by Princess Nourah bint Abdulrahman University and Researchers Supporting Project number(PNURSP2025R346)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at an early stage is crucial to prevent their spread,which can lead to substantial losses.The development of deep learning models for detecting crop diseases is an active area of research in smart agriculture.This study focuses on mango plant diseases and employs the ConvNeXt and Vision Transformer(ViT)architectures.Two datasets were used.The first,MangoLeafBD,contains data for mango leaf diseases such as anthracnose,bacterial canker,gall midge,and powdery mildew.The second,SenMangoFruitDDS,includes data for mango fruit diseases such as Alternaria,Anthracnose,Black Mould Rot,Healthy,and Stem and Rot.Both datasets were obtained from publicly available sources.The proposed model achieved an accuracy of 99.87%on the MangoLeafBD dataset and 98.40%on the MangoFruitDDS dataset.The results demonstrate that ConvNeXt and ViT models can effectively diagnose mango diseases,enabling farmers to identify these conditions more efficiently.The system contributes to increased mango production and minimizes economic losses by reducing the time and effort needed for manual diagnostics.Additionally,the proposed system is integrated into a mobile application that utilizes the model as a backend to detect mango diseases instantly.
基金supported by the National Natural Science Foundation of China(Nos.U2BB2077 and 42374226)the Natural Science Foundation of Jiangxi Province(20232BAB201043 and 20232BCJ23006)the Nuclear energy development project of the National Defense Science and Industry Bureau(Nos.20201192-01,20201192-03).
文摘The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively.
基金This work was supported by the Science and Technology innovation project of Shanghai Science and Technology Commission(19441905800)the Natural National Science Foundation of China(62175156,81827807,8210041176,82101177,61675134)+1 种基金the Project of State Key Laboratory of Ophthalmology,Optometry and Visual Science,Wenzhou Medical University(K181002)the Key R&D Program Projects in Zhejiang Province(2019C03045).
文摘Age-related Macular Degeneration(AMD)and Diabetic Macular Edema(DME)are two com-mon retinal diseases for elder people that may ultimately cause irreversible blindness.Timely and accurate diagnosis is essential for the treatment of these diseases.In recent years,computer-aided diagnosis(CAD)has been deeply investigated and effectively used for rapid and early diagnosis.In this paper,we proposed a method of CAD using vision transformer to analyze optical co-herence tomography(OCT)images and to automatically discriminate AMD,DME,and normal eyes.A classification accuracy of 99.69%was achieved.After the model pruning,the recognition time reached 0.010 s and the classification accuracy did not drop.Compared with the Con-volutional Neural Network(CNN)image classification models(VGG16,Resnet50,Densenet121,and EfficientNet),vision transformer after pruning exhibited better recognition ability.Results show that vision transformer is an improved alternative to diagnose retinal diseases more accurately.
基金supported in part by the National Natural Science Foundation of China under Grants 61502162,61702175,and 61772184in part by the Fund of the State Key Laboratory of Geo-information Engineering under Grant SKLGIE2016-M-4-2+4 种基金in part by the Hunan Natural Science Foundation of China under Grant 2018JJ2059in part by the Key R&D Project of Hunan Province of China under Grant 2018GK2014in part by the Open Fund of the State Key Laboratory of Integrated Services Networks under Grant ISN17-14Chinese Scholarship Council(CSC)through College of Computer Science and Electronic Engineering,Changsha,410082Hunan University with Grant CSC No.2018GXZ020784.
文摘Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field.