期刊文献+
共找到417,555篇文章
< 1 2 250 >
每页显示 20 50 100
A Comprehensive Review of Pill Image Recognition
1
作者 Linh Nguyen Thi My Viet-Tuan Le +1 位作者 Tham Vo Vinh Truong Hoang 《Computers, Materials & Continua》 2025年第3期3693-3740,共48页
Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensur... Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition. 展开更多
关键词 Pill image recognition pill image identification pill recognition pill identification pill image retrieval pill retrieval computer vision
在线阅读 下载PDF
Swiftly accessible retinomorphic hardware for in-sensor image preprocessing and recognition:IGZO-based neuro-inspired optical image sensor arrays with metallic sensitization island
2
作者 Kyungmoon Kwak Kyungho Park +7 位作者 Jae Seong Han Byung Ha Kang Dong Hyun Choi Kunho Moon Seok Min Hong Gwan In Kim Ju Hyun Lee Hyun Jae Kim 《International Journal of Extreme Manufacturing》 2025年第6期494-510,共17页
In-optical-sensor computing architectures based on neuro-inspired optical sensor arrays have become key milestones for in-sensor artificial intelligence(AI)technology,enabling intelligent vision sensing and extensive ... In-optical-sensor computing architectures based on neuro-inspired optical sensor arrays have become key milestones for in-sensor artificial intelligence(AI)technology,enabling intelligent vision sensing and extensive data processing.These architectures must demonstrate potential advantages in terms of mass production and complementary metal oxide semiconductor compatibility.Here,we introduce a visible-light-driven neuromorphic vision system that integrates front-end retinomorphic photosensors with a back-end artificial neural network(ANN),employing a single neuro-inspired indium-g allium-zinc-oxide photo transistor(NIP)featuring an aluminum sensitization layer(ASL).By methodically adjusting the ASL coverage on IGZO phototransistors,a fast-switching response-type and a synaptic response-type of IGZO photo transistors are successfully developed.Notably,the fabricated NIP shows a remarkable retina-like photoinduced synaptic plasticity under wavelengths up to 635 nm,with over256-states,weight update nonlinearity below 0.1,and a dynamic range of 64.01.Owing to this technology,a 6×6 neuro-inspired optical image sensor array with the NIP can perform highly integrated sensing,memory,and preprocessing functions,including contrast enhancement,and handwritten digit image recognition.The demonstrated prototype highlights the potential for efficient hardware implementations in in-sensor AI technologies. 展开更多
关键词 retinomorphic hardware in-sensor preprocessing image recognition neuro-inspired optical sensors indium-gallium-zinc-oxide metallic sensitization layer
在线阅读 下载PDF
Fusion method for water depth data from multiple sources based on image recognition
3
作者 Huiyu HAN Feng ZHOU 《Journal of Oceanology and Limnology》 2025年第4期1093-1105,共13页
Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(... Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(FIR)”method for multi-sourced depth data fusion,and used it to merge the electronic nautical chart dataset(referred to as Chart2014 in this paper)with the global digital elevation dataset(referred to as Globalbath2002 in this paper).Compared to the traditional fusion of two datasets by direct combination and interpolation,the new Grid-DEM formed by FIR can better represent the data characteristics of Chart2014,reduce the calculation difficulty,and be more intuitive,and,the choice of different interpolation methods in FIR and the influence of the“exclusion radius R”parameter were discussed.FIR avoids complex calculations of spatial distances among points from different sources,and instead uses spatial exclusion map to perform one-step screening based on the exclusion radius R,which greatly improved the fusion status of a reliable dataset.The fusion results of different experiments were analyzed statistically with root mean square error and mean relative error,showing that the interpolation methods based on Delaunay triangulation are more suitable for the fusion of nautical chart depth of China,and factors such as the point density distribution of multiple source data,accuracy,interpolation method,and various terrain conditions should be fully considered when selecting the exclusion radius R. 展开更多
关键词 water depth fusion method Grid Digital Elevation Model(Grid-DEM) image recognition Delaunay triangulation
在线阅读 下载PDF
A teacher-student based attention network for fine-grainedimage recognition
4
作者 Ang Li Xueyi Zhang +1 位作者 Peilin Li Bin Kang 《Digital Communications and Networks》 2025年第1期52-59,共8页
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin... Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework. 展开更多
关键词 Fine-grained image recognition Collaborative teacher-student strategy Multi-attention Global attention
在线阅读 下载PDF
Multi-Stage-Based Siamese Neural Network for Seal Image Recognition
5
作者 Jianfeng Lu Xiangye Huang +3 位作者 Caijin Li Renlin Xin Shanqing Zhang Mahmoud Emam 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期405-423,共19页
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited... Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets. 展开更多
关键词 Seal recognition seal authentication document tampering siamese network spatial transformer network similarity comparison network
在线阅读 下载PDF
Research on the balance optimization algorithm of image recognition accuracy and speed based on autocollimator measurement
6
作者 LI Renpu MA Long +3 位作者 CUI Jiwen GUO Junqi Andrei KULIKOV WEN Dandan 《Optoelectronics Letters》 2025年第2期121-128,共8页
The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging... The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging spot.There is limited report on the core algorithmic techniques in current commercial products and recent scientific research.This paper addresses the performance requirements of coordinate reading accuracy and operational speed in autocollimator image positioning.It proposes a cross-image center recognition scheme based on the Hough transform and another based on Zernike moments and the least squares method.Through experimental evaluation of the accuracy and speed of both schemes,the optimal image recognition scheme balancing measurement accuracy and speed for the autocollimator is determined.Among these,the center recognition method based on Zernike moments and the least squares method offers higher measurement accuracy and stability,while the Hough transform-based method provides faster measurement speed. 展开更多
关键词 image optimization recognition
原文传递
Transformer-Based Fusion of Infrared and Visible Imagery for Smoke Recognition in Commercial Areas
7
作者 Chongyang Wang Qiongyan Li +2 位作者 Shu Liu Pengle Cheng Ying Huang 《Computers, Materials & Continua》 2025年第9期5157-5176,共20页
With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations... With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations in viewing angles.This study proposes a novel multimodal smoke detection method that fuses infrared and visible imagery using a transformer-based deep learning model.By capturing both thermal and visual cues,our approach significantly enhances the accuracy and robustness of smoke detection in business parks scenes.We first established a dual-view dataset comprising infrared and visible light videos,implemented an innovative image feature fusion strategy,and designed a deep learning model based on the transformer architecture and attention mechanism for smoke classification.Experimental results demonstrate that our method outperforms existing methods,under the condition of multi-view input,it achieves an accuracy rate of 90.88%,precision rate of 98.38%,recall rate of 92.41%and false positive and false negative rates both below 5%,underlining the effectiveness of the proposed multimodal and multi-view fusion approach.The attention mechanism plays a crucial role in improving detection performance,particularly in identifying subtle smoke features. 展开更多
关键词 Multimodal image processing smoke recognition urban safety environmental monitoring
在线阅读 下载PDF
A novel coal-rock recognition method in coal mining face based on fusing laser point cloud and images
8
作者 Yang Liu Lei Si +4 位作者 Zhongbin Wang Miao Chen Xin Li Dong Wei Jinheng Gu 《International Journal of Mining Science and Technology》 2025年第7期1057-1071,共15页
Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and imag... Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and images,named Multi-Modal Frustum PointNet(MMFP).Firstly,MobileNetV3 is used as the backbone network of Mask R-CNN to reduce the network parameters and compress the model volume.The dilated convolutional block attention mechanism(Dilated CBAM)and inception structure are combined with MobileNetV3 to further enhance the detection accuracy.Subsequently,the 2D target candidate box is calculated through the improved Mask R-CNN,and the frustum point cloud in the 2D target candidate box is extracted to reduce the calculation scale and spatial search range.Then,the self-attention PointNet is constructed to segment the fused point cloud within the frustum range,and the bounding box regression network is used to predict the bounding box parameters.Finally,an experimental platform of shearer coal wall cutting is established,and multiple comparative experiments are conducted.Experimental results indicate that the proposed coal-rock recognition method is superior to other advanced models. 展开更多
关键词 Coal miningface Coal-rock recognition Deep learning Laser pointcloud and images fusion Multi-Modal Frustum PointNet(MMFP)
在线阅读 下载PDF
Waterbird image recognition using lightweight deep learning in wetland environment
9
作者 Qingquan Huang Changchun Zhang +3 位作者 Chunhe Hu Jiangjian Xie Yuan Wang Junguo Zhang 《Avian Research》 2025年第4期832-845,共14页
Wetland waterbirds serve as key ecological indicators for assessing habitat quality and biodiversity.Accurate identification of waterbird species is a cornerstone of long-term ecological monitoring.The resulting data ... Wetland waterbirds serve as key ecological indicators for assessing habitat quality and biodiversity.Accurate identification of waterbird species is a cornerstone of long-term ecological monitoring.The resulting data are critical for assessing wetland ecosystem health and biodiversity.However,prevailing recognition approaches often prioritize detection accuracy at the expense of computational efficiency.They are also hindered by complex background heterogeneity and interspecies visual similarity.These limitations hinder the scalability and practical deployment of such methods for on-site ecological monitoring within wetland ecosystems.To address these challenges,this study proposes an optimized end-to-end framework,ShuffleNetV2-iRMB-ShapeIoU-YOLO(SISYOLO),designed for robust recognition of wetland waterbirds in complex environments.Specifically,the proposed framework integrates ShuffleNetV2 with inverted Residual Mobile Blocks(iRMB) to improve computational efficiency while maintaining robust feature representation.This design further enables deployment on resource-constrained mobile and embedded platforms.Additionally,ShapeIoU,a refined bounding box similarity metric,is introduced to jointly optimize overlap and shape consistency,effectively mitigating misclassification among visually similar species.Experimental results on the IC-Beijing dataset show that SIS-YOLO achieves 91.1% precision and 79.1% mAP@0.5:0.95 with only 2.9 million parameters.Compared with the lightweight baseline YOLOv8n,it improves precision by 2% and mAP@0.5:0.95 by 1.2%,while requiring fewer parameters and offering higher computational efficiency. 展开更多
关键词 iRMB ShapeIoU ShuffleNetV2 SIS-YOLO Wetland waterbird recognition
在线阅读 下载PDF
Automatic Digital Inclinometer Calibration System Based on Image Recognition
10
作者 FENG Zheming CHEN Gang +1 位作者 NAN Zhuojiang TAO Wei 《Journal of Shanghai Jiaotong university(Science)》 2025年第2期280-290,共11页
Traditional calibration method for the digital inclinometer relies on manual inspection,and results in its disadvantages of complicated process,low-efficiency and human errors easy to be introduced.To improve both the... Traditional calibration method for the digital inclinometer relies on manual inspection,and results in its disadvantages of complicated process,low-efficiency and human errors easy to be introduced.To improve both the calibration accuracy and efficiency of digital inclinometer,an automatic digital inclinometer calibration system was developed in this study,and a new display tube recognition algorithm was proposed.First,a high-precision automatic turntable was taken as the reference to calculate the indication error of the inclinometer.Then,the automatic inclinometer calibration control process and the digital inclinometer zero-setting function were formulated.For display tube recognition,a new display tube recognition algorithm combining threading method and feature extraction method was proposed.Finally,the calibration system was calibrated by photoelectric autocollimator and regular polygon mirror,and the calibration system error and repeatability were calculated via a series of experiments.The experimental results showed that the indication error of the proposed calibration system was less than 4",and the repeatability was 3.9".A digital inclinometer with the resolution of 0.1°was taken as a testing example,within the calibration points'range of[-90°,90°],the repeatability of the testing was 0.085°,and the whole testing process was less than 90 s.The digital inclinometer indication error is mainly introduced by the digital inclinometer resolution according to the uncertainty evaluation. 展开更多
关键词 digital inclinometer automatic calibration high-precision turntable number recognition
原文传递
Application of scattering image wavelet transform in cave recognition:A case study on a bedrock buried hill reservoir in Bongor Basin,Chad
11
作者 XiaoYu-Jiang Tao Song +4 位作者 Li-Deng Gan Yan Zhang Wen-Hui Du Xing-Yan Fan Xiao-FengDai 《Applied Geophysics》 2025年第2期535-545,561,共12页
Caves located in the buried hill reservoir of granite bedrock in Bongor Basin,Chad,are excessively small and cannot be identifi ed in conventional refl ection wave imaging profi les because their refl ection character... Caves located in the buried hill reservoir of granite bedrock in Bongor Basin,Chad,are excessively small and cannot be identifi ed in conventional refl ection wave imaging profi les because their refl ection characteristics are suppressed by the strong refl ection of the weathering crust at the top of the buried hill.In contrast to refl ection wave imaging,which refl ects the refl ection characteristics of continuous interfaces,scattered wave imaging refl ects the reflection characteristics of discontinuous geological bodies.Scattering waves can be produced in the presence of discontinuous points,such as karst caves,fractures,and stratum vanishing points.Scattering imaging can accurately provide the location of discontinuous abnormal bodies,highlight the seismic reflection characteristics of caves with weak reflections,and eliminate continuous strong reflections to strengthen the ability of seismic data to distinguish discontinuous geological bodies and solve the inability of seismic data from conventional poststack refl ection wave imaging to identify small caves in buried hills.Three-parameter wavelet spectral decomposition technology is used to depict the boundary of caves accurately in accordance with the strong energy spectral characteristics of caves in the section of the scattering imaging seismic data of the granite bedrock buried hill reservoir.Compared with the attributes extracted from conventional refl ection wave poststack seismic data,those acquired from scattering imaging bodies are more reliable and consistent with the actual location of caves on boreholes and have higher resolution.For connected wells,the attributes extracted from the conventional poststack seismic data can only predict whether caves are developed,whereas those calculated from scattering imaging can not only predict whether caves are present but also refl ects the degree of cave development.On the plane,the attributes obtained from scattering imaging calculation are more consistent with the geological law of cave development.On the basis of this fi nding and in accordance with the results of the three-parameter wavelet spectral decomposition of scattering imaging seismic data,the degree of cave development is classifi ed,and the favorable location for reservoir development in the study area is identifi ed.This solution provides an eff ective way to improve the exploration accuracy of cave-type granite buried hill reservoirs. 展开更多
关键词 Angle domain imaging Scattering imaging Granite bedrock buried hill Three-parameter wavelet Cave
在线阅读 下载PDF
TSMS-InceptionNeXt:A Framework for Image-Based Combustion State Recognition in Counterflow Burners via Feature Extraction Optimization
12
作者 Huiling Yu Xibei Jia +1 位作者 Yongfeng Niu Yizhuo Zhang 《Computers, Materials & Continua》 2025年第6期4329-4352,共24页
The counterflow burner is a combustion device used for research on combustion.By utilizing deep convolutional models to identify the combustion state of a counter flow burner through visible flame images,it facilitate... The counterflow burner is a combustion device used for research on combustion.By utilizing deep convolutional models to identify the combustion state of a counter flow burner through visible flame images,it facilitates the optimization of the combustion process and enhances combustion efficiency.Among existing deep convolutional models,InceptionNeXt is a deep learning architecture that integrates the ideas of the Inception series and ConvNeXt.It has garnered significant attention for its computational efficiency,remarkable model accuracy,and exceptional feature extraction capabilities.However,since this model still has limitations in the combustion state recognition task,we propose a Triple-Scale Multi-Stage InceptionNeXt(TSMS-InceptionNeXt)combustion state recognitionmethod based on feature extraction optimization.First,to address the InceptionNeXt model’s limited ability to capture dynamic features in flame images,we introduce Triplet Attention,which applies attention to the width,height,and Red Green Blue(RGB)dimensions of the flame images to enhance its ability to model dynamic features.Secondly,to address the issue of key information loss in the Inception deep convolution layers,we propose a Similarity-based Feature Concentration(SimC)mechanism to enhance the model’s capability to concentrate on critical features.Next,to address the insufficient receptive field of the model,we propose a Multi-Scale Dilated Channel Parallel Integration(MDCPI)mechanism to enhance the model’s ability to extract multi-scale contextual information.Finally,to address the issue of the model’s Multi-Layer Perceptron Head(MlpHead)neglecting channel interactions,we propose a Channel Shuffle-Guided Channel-Spatial Attention(ShuffleCS)mechanism,which integrates information from different channels to further enhance the representational power of the input features.To validate the effectiveness of the method,experiments are conducted on the counterflow burner flame visible light image dataset.The experimental results show that the TSMS-InceptionNeXt model achieved an accuracy of 85.71%on the dataset,improving by 2.38%over the baseline model and outperforming the baseline model’s performance.It achieved accuracy improvements of 10.47%,4.76%,11.19%,and 9.28%compared to the Reparameterized Visual Geometry Group(RepVGG),Squeeze-erunhanced Axial Transoformer(SeaFormer),Simplified Graph Transformers(SGFormer),and VanillaNet models,respectively,effectively enhancing the recognition performance for combustion states in counterflow burners. 展开更多
关键词 Counterflow burner combustion state recognition InceptionNeXt dilated convolution channel shuffling
在线阅读 下载PDF
DFNet: A Differential Feature-Incorporated Residual Network for Image Recognition
13
作者 Pengxing Cai Yu Zhang +2 位作者 Houtian He Zhenyu Lei Shangce Gao 《Journal of Bionic Engineering》 2025年第2期931-944,共14页
Residual neural network (ResNet) is a powerful neural network architecture that has proven to be excellent in extracting spatial and channel-wise information of images. ResNet employs a residual learning strategy that... Residual neural network (ResNet) is a powerful neural network architecture that has proven to be excellent in extracting spatial and channel-wise information of images. ResNet employs a residual learning strategy that maps inputs directly to outputs, making it less difficult to optimize. In this paper, we incorporate differential information into the original residual block to improve the representative ability of the ResNet, allowing the modified network to capture more complex and metaphysical features. The proposed DFNet preserves the features after each convolutional operation in the residual block, and combines the feature maps of different levels of abstraction through the differential information. To verify the effectiveness of DFNet on image recognition, we select six distinct classification datasets. The experimental results show that our proposed DFNet has better performance and generalization ability than other state-of-the-art variants of ResNet in terms of classification accuracy and other statistical analysis. 展开更多
关键词 Deep learning Residual neural network Pattern recognition Residual block Differential feature
在线阅读 下载PDF
Research on Crop Image Classification and Recognition Based on Improved HRNet
14
作者 Min Ji Shucheng Yang 《Computers, Materials & Continua》 2025年第8期3075-3103,共29页
In agricultural production,crop images are commonly used for the classification and identification of various crops.However,several challenges arise,including low image clarity,elevated noise levels,low accuracy,and p... In agricultural production,crop images are commonly used for the classification and identification of various crops.However,several challenges arise,including low image clarity,elevated noise levels,low accuracy,and poor robustness of existing classification models.To address these issues,this research proposes an innovative crop image classification model named Lap-FEHRNet,which integrates a Laplacian Pyramid Super Resolution Network(LapSRN)with a feature enhancement high-resolution network based on attention mechanisms(FEHRNet).To mitigate noise interference,this research incorporates the LapSRN network,which utilizes a Laplacian pyramid structure to extract multi-level feature details from low-resolution images through a systematic layer-by-layer amplification and pixel detail superposition process.This gradual reconstruction enhances the high-frequency information of the image,enabling super-resolution reconstruction of low-quality images.To obtain a broader range of comprehensive and diverse features,this research employs the FEHRNetmodel for both deep and shallow feature extraction.This approach results in features that encapsulate multi-scale information and integrate both deep and shallow insights.To effectively fuse these complementary features,this research introduces an attention mechanism during the feature enhancement stage.This mechanism highlights important regions within the image,assigning greater weights to salient features and resulting in a more comprehensive and effective image feature representation.Consequently,the accuracy of image classification is significantly improved.Experimental results demonstrate that the Lap-FEHRNetmodel achieves impressive classification accuracies of 98.8%on the crop classification dataset and 98.57%on the rice leaf disease dataset,underscoring the model’s outstanding accuracy,robustness,and generalization capability. 展开更多
关键词 image reconstruction deep and shallow features feature enhancement LapSRN HRNet
在线阅读 下载PDF
Image Enhancement Combined with LLM Collaboration for Low-Contrast Image Character Recognition
15
作者 Qin Qin Xuan Jiang +3 位作者 Jinhua Jiang Dongfang Zhao Zimei Tu Zhiwei Shen 《Computers, Materials & Continua》 2025年第12期4849-4867,共19页
The effectiveness of industrial character recognition on cast steel is often compromised by factors such as corrosion,surface defects,and low contrast,which hinder the extraction of reliable visual information.The pro... The effectiveness of industrial character recognition on cast steel is often compromised by factors such as corrosion,surface defects,and low contrast,which hinder the extraction of reliable visual information.The problem is further compounded by the scarcity of large-scale annotated datasets and complex noise patterns in real-world factory environments.This makes conventional OCR techniques and standard deep learning models unreliable.To address these limitations,this study proposes a unified framework that integrates adaptive image preprocessing with collaborative reasoning among LLMs.A Biorthogonal 4.4(bior4.4)wavelet transform is adaptively tuned using DE to enhance character edge clarity,suppress background noise,and retain morphological structure,thereby improving input quality for subsequent recognition.A structured three-round debate mechanism is further introduced within a multi-agent architecture,employing GPT-4o and Gemini-2.0-flash as role-specialized agents to perform complementary inference and achieve consensus.The proposed system is evaluated on a proprietary dataset of 48 high-resolution images collected under diverse industrial conditions.Experimental results show that the combination of DE-based enhancement and multi-agent collaboration consistently outperforms traditional baselines and ablated models,achieving an F1-score of 94.93%and an LCS accuracy of 93.30%.These results demonstrate the effectiveness of integrating signal processing with multi-agent LLM reasoning to achieve robust and interpretable OCR in visually complex and data-scarce industrial environments. 展开更多
关键词 Low-contrast images differential evolution(DE) wavelet transform multi-agent systems large language models(LLMs)
在线阅读 下载PDF
Dendritic Learning-Incorporated Vision Transformer for Image Recognition 被引量:3
16
作者 Zhiming Zhang Zhenyu Lei +2 位作者 Masaaki Omura Hideyuki Hasegawa Shangce Gao 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期539-541,共3页
Dear Editor,This letter proposes to integrate dendritic learnable network architecture with Vision Transformer to improve the accuracy of image recognition.In this study,based on the theory of dendritic neurons in neu... Dear Editor,This letter proposes to integrate dendritic learnable network architecture with Vision Transformer to improve the accuracy of image recognition.In this study,based on the theory of dendritic neurons in neuroscience,we design a network that is more practical for engineering to classify visual features.Based on this,we propose a dendritic learning-incorporated vision Transformer(DVT),which out-performs other state-of-the-art methods on three image recognition benchmarks. 展开更多
关键词 image network image
在线阅读 下载PDF
Modeling load distribution for rural photovoltaic grid areas using image recognition
17
作者 Ning Zhou Bowen Shang +1 位作者 Jinshuai Zhang Mingming Xu 《Global Energy Interconnection》 EI CSCD 2024年第3期270-283,共14页
Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,ru... Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability. 展开更多
关键词 Deep learning Remote sensing image recognition Photovoltaic development Load distribution modeling Power flow calculation
在线阅读 下载PDF
Phenotypic Image Recognition of Asparagus Stem Blight Based on Improved YOLOv8
18
作者 Shunshun Ji Jiajun Sun Chao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4017-4029,共13页
Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus ... Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture. 展开更多
关键词 YOLOv8 asparagus stem blight image recognition PEST
在线阅读 下载PDF
Squeeze and Excitation Convolution with Shortcut for Complex Plasma Image Recognition
19
作者 Baoxia Li Wenzhuo Chen +5 位作者 Xiaojiang Tang Shaohuang Bian Yang Liu Junwei Guo Dan Zhang Feng Huang 《Computers, Materials & Continua》 SCIE EI 2024年第8期2221-2236,共16页
Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of part... Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production. 展开更多
关键词 image recognition complex plasmas deep learning
在线阅读 下载PDF
Deep learning-based recognition of stained tongue coating images
20
作者 ZHONG Liqin XIN Guojiang +3 位作者 PENG Qinghua CUI Ji ZHU Lei LIANG Hao 《Digital Chinese Medicine》 CAS CSCD 2024年第2期129-136,共8页
Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of s... Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis. 展开更多
关键词 Deep learning Tongue coating Stained coating image recognition Traditional Chinese medicine(TCM) Intelligent diagnosis
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部