Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensur...Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition.展开更多
The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging...The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging spot.There is limited report on the core algorithmic techniques in current commercial products and recent scientific research.This paper addresses the performance requirements of coordinate reading accuracy and operational speed in autocollimator image positioning.It proposes a cross-image center recognition scheme based on the Hough transform and another based on Zernike moments and the least squares method.Through experimental evaluation of the accuracy and speed of both schemes,the optimal image recognition scheme balancing measurement accuracy and speed for the autocollimator is determined.Among these,the center recognition method based on Zernike moments and the least squares method offers higher measurement accuracy and stability,while the Hough transform-based method provides faster measurement speed.展开更多
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin...Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework.展开更多
With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations...With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations in viewing angles.This study proposes a novel multimodal smoke detection method that fuses infrared and visible imagery using a transformer-based deep learning model.By capturing both thermal and visual cues,our approach significantly enhances the accuracy and robustness of smoke detection in business parks scenes.We first established a dual-view dataset comprising infrared and visible light videos,implemented an innovative image feature fusion strategy,and designed a deep learning model based on the transformer architecture and attention mechanism for smoke classification.Experimental results demonstrate that our method outperforms existing methods,under the condition of multi-view input,it achieves an accuracy rate of 90.88%,precision rate of 98.38%,recall rate of 92.41%and false positive and false negative rates both below 5%,underlining the effectiveness of the proposed multimodal and multi-view fusion approach.The attention mechanism plays a crucial role in improving detection performance,particularly in identifying subtle smoke features.展开更多
Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(...Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(FIR)”method for multi-sourced depth data fusion,and used it to merge the electronic nautical chart dataset(referred to as Chart2014 in this paper)with the global digital elevation dataset(referred to as Globalbath2002 in this paper).Compared to the traditional fusion of two datasets by direct combination and interpolation,the new Grid-DEM formed by FIR can better represent the data characteristics of Chart2014,reduce the calculation difficulty,and be more intuitive,and,the choice of different interpolation methods in FIR and the influence of the“exclusion radius R”parameter were discussed.FIR avoids complex calculations of spatial distances among points from different sources,and instead uses spatial exclusion map to perform one-step screening based on the exclusion radius R,which greatly improved the fusion status of a reliable dataset.The fusion results of different experiments were analyzed statistically with root mean square error and mean relative error,showing that the interpolation methods based on Delaunay triangulation are more suitable for the fusion of nautical chart depth of China,and factors such as the point density distribution of multiple source data,accuracy,interpolation method,and various terrain conditions should be fully considered when selecting the exclusion radius R.展开更多
Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and imag...Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and images,named Multi-Modal Frustum PointNet(MMFP).Firstly,MobileNetV3 is used as the backbone network of Mask R-CNN to reduce the network parameters and compress the model volume.The dilated convolutional block attention mechanism(Dilated CBAM)and inception structure are combined with MobileNetV3 to further enhance the detection accuracy.Subsequently,the 2D target candidate box is calculated through the improved Mask R-CNN,and the frustum point cloud in the 2D target candidate box is extracted to reduce the calculation scale and spatial search range.Then,the self-attention PointNet is constructed to segment the fused point cloud within the frustum range,and the bounding box regression network is used to predict the bounding box parameters.Finally,an experimental platform of shearer coal wall cutting is established,and multiple comparative experiments are conducted.Experimental results indicate that the proposed coal-rock recognition method is superior to other advanced models.展开更多
Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus ...Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture.展开更多
Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of part...Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production.展开更多
Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,ru...Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability.展开更多
Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of s...Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis.展开更多
The automated picking technology of tea is an important part of the development of smart agriculture, which affects the development of the tea industry to a certain extent. Tea leaf recognition and robotic tea picking...The automated picking technology of tea is an important part of the development of smart agriculture, which affects the development of the tea industry to a certain extent. Tea leaf recognition and robotic tea picking end-effector are the key technologies for automated tea picking. This paper proposes a set of algorithms for tea leaf differentiation and recognition based on the principle of colour difference. And on the basis of this algorithm, a tea picking end-effector is designed. The experiments show that the designed tea picking end-effector has good recognition ability and high tea picking speed.展开更多
Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recogni...Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recognition algorithm based on deep learning network is proposed in this paper.The network consists of four neural layers,including input layer for image preprocessing,convolutional neural networks(CNNs)layer for feature extraction,bidirectional long-short term network(BDLSTM)layer for sequence prediction,and connectionist temporal classification(CTC)layer for text sequence alignment and classification.Besides,a novel data processing method is performed for data length equalization.Based on this,groups of experiments,based on six typical databases,involved in evaluation indicators of character correct rate,training time cost,storage space cost,and testing time cost are carried out.The experimental results show that the proposed algorithm has better performances in accuracy and efficiency than other classical algorithms.展开更多
This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system ad...This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system adopts advanced image recognition technology to monitor road conditions in real-time through the camera and to detect and measure distance to foreign objects on the road.The system visualizes the detection results on the wheelchair screen to assist the user in avoiding and improving the safety of their daily travel.In addition,the system also includes crawler tracks,seat heating,snow and rain protection,and other functions.The wheelchair has a wide range of application prospects and development potential.It is expected to be widely used in the future,providing a strong guarantee for the safe travel of disabled individuals in China.展开更多
The traditional synthetic aperture radar(SAR) image recognition techniques focus on the electro magnetic (EM) scattering centers, ignoring the important role of the shadow information on the SAR image recognition....The traditional synthetic aperture radar(SAR) image recognition techniques focus on the electro magnetic (EM) scattering centers, ignoring the important role of the shadow information on the SAR image recognition. It is difficult to classify targets by the shadow information independently, because the shadow shape is dependent on the radar aspect angle, the depression angle and the resolution. Moreover, the shadow shapes of different targets are similar. When the multiple SAR images of one target from different aspects are available, the performance of the target recognition can be improved. Aimed at the problem, a multi-aspect SAR image recognition technique based on the shadow information is developed. It extracts shadow profiles from SAR images, and takes chain codes as the feature vectors of targets. Then, feature vectors on multiple aspects of the same target are combined with feature sequences, and the hidden Markov model (HMM) is applied to the feature sequences for the target recognition. The simulation result shows the effectiveness of the method.展开更多
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi...In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.展开更多
Clinical gastrointestinal endoscopy has significantly advanced owing to machine learning techniques,which have produced novel instruments and approaches for early-stage disease diagnosis,categorization,and therapy.Mac...Clinical gastrointestinal endoscopy has significantly advanced owing to machine learning techniques,which have produced novel instruments and approaches for early-stage disease diagnosis,categorization,and therapy.Machine learning applications in gastrointestinal endoscopy,such as image identification,lesion detection,pathological categorization,and surgical aid,are examined in this minireview.We examine the potential of machine learning to improve treatment regimens,lower misdiagnosis rates,and increase diagnostic accuracy by evaluating previous research.In addition,this study discusses current issues such clinical applicability,model generalization,and data privacy.It also suggests future research directions to help clinicians and researchers in the field of gastrointestinal endoscopy.展开更多
Recent research on adversarial attacks has primarily focused on white-box attack techniques,with limited exploration of black-box attack methods.Furthermore,in many black-box research scenarios,it is assumed that the ...Recent research on adversarial attacks has primarily focused on white-box attack techniques,with limited exploration of black-box attack methods.Furthermore,in many black-box research scenarios,it is assumed that the output label and probability distribution can be observed without imposing any constraints on the number of attack attempts.Unfortunately,this disregard for the real-world practicality of attacks,particularly their potential for human detectability,has left a gap in the research landscape.Considering these limitations,our study focuses on using a similar color attack method,assuming access only to the output label,limiting the number of attack attempts to 100,and subjecting the attacks to human perceptibility testing.Through this approach,we demonstrated the effectiveness of black box attack techniques in deceiving models and achieved a success rate of 82.68%in deceiving humans.This study emphasizes the significance of research that addresses the challenge of deceiving both humans and models,highlighting the importance of real-world applicability.展开更多
Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only...Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments.展开更多
Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.Whil...Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition.展开更多
In order to realize the automatic recognition and classification of cracks with different depths,in this study,several deep convolutional neural networks including AlexNet,ResNet,and DenseNet were employed to identify...In order to realize the automatic recognition and classification of cracks with different depths,in this study,several deep convolutional neural networks including AlexNet,ResNet,and DenseNet were employed to identify and classify cracks at different depths and in various materials.An analysis process for the automatic classification of crack damage was presented.The image dataset used for model training was obtained from scanning experiments on aluminum and titanium alloy plates using an ultrasonic phased-array flaw detector.All models were trained and validated with the dataset;the proposed models were compared using classification precision and loss values.The results show that the automatic recognition and classification of crack depth can be realized by using the deep learning algorithm to analyze the ultrasonic phased array images,and the classification precision of DenseNet is the highest.The problem that ultrasonic damage identification relies on manual experience is solved.展开更多
文摘Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition.
基金supported by the National Natural Science Foundation of China (No.62375031)the Natural Science Foundation of Chongqing Municipality (No.2024NSCQ-LZX0041)。
文摘The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging spot.There is limited report on the core algorithmic techniques in current commercial products and recent scientific research.This paper addresses the performance requirements of coordinate reading accuracy and operational speed in autocollimator image positioning.It proposes a cross-image center recognition scheme based on the Hough transform and another based on Zernike moments and the least squares method.Through experimental evaluation of the accuracy and speed of both schemes,the optimal image recognition scheme balancing measurement accuracy and speed for the autocollimator is determined.Among these,the center recognition method based on Zernike moments and the least squares method offers higher measurement accuracy and stability,while the Hough transform-based method provides faster measurement speed.
基金supported by the National Natural Science Foundation of China,China (Grants No.62171232)the Priority Academic Program Development of Jiangsu Higher Education Institutions,China。
文摘Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework.
基金supported by the National Natural Science Foundation of China(32171797)Chunhui Project Foundation of the Education Department of China(HZKY20220026).
文摘With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations in viewing angles.This study proposes a novel multimodal smoke detection method that fuses infrared and visible imagery using a transformer-based deep learning model.By capturing both thermal and visual cues,our approach significantly enhances the accuracy and robustness of smoke detection in business parks scenes.We first established a dual-view dataset comprising infrared and visible light videos,implemented an innovative image feature fusion strategy,and designed a deep learning model based on the transformer architecture and attention mechanism for smoke classification.Experimental results demonstrate that our method outperforms existing methods,under the condition of multi-view input,it achieves an accuracy rate of 90.88%,precision rate of 98.38%,recall rate of 92.41%and false positive and false negative rates both below 5%,underlining the effectiveness of the proposed multimodal and multi-view fusion approach.The attention mechanism plays a crucial role in improving detection performance,particularly in identifying subtle smoke features.
基金Supported by the National Key R&D Program of China (No.2023YFC3008100)the National Natural Science Foundation of China (No.U23A2033)
文摘Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(FIR)”method for multi-sourced depth data fusion,and used it to merge the electronic nautical chart dataset(referred to as Chart2014 in this paper)with the global digital elevation dataset(referred to as Globalbath2002 in this paper).Compared to the traditional fusion of two datasets by direct combination and interpolation,the new Grid-DEM formed by FIR can better represent the data characteristics of Chart2014,reduce the calculation difficulty,and be more intuitive,and,the choice of different interpolation methods in FIR and the influence of the“exclusion radius R”parameter were discussed.FIR avoids complex calculations of spatial distances among points from different sources,and instead uses spatial exclusion map to perform one-step screening based on the exclusion radius R,which greatly improved the fusion status of a reliable dataset.The fusion results of different experiments were analyzed statistically with root mean square error and mean relative error,showing that the interpolation methods based on Delaunay triangulation are more suitable for the fusion of nautical chart depth of China,and factors such as the point density distribution of multiple source data,accuracy,interpolation method,and various terrain conditions should be fully considered when selecting the exclusion radius R.
基金supported in part by the National Natural Science Foundation of China(Nos.52174152 and 52074271)in part by the Xuzhou Basic Research Program Project(No.KC23051)+2 种基金in part by the Shandong Province Technology Innovation Guidance Plan(Central Guidance for Local Scientific and Technological Development Fund)(No.YDZX2024119)in part by the Graduate Innovation Program of China University of Mining and Technology(No.2025WLKXJ088)in part by the Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX252830).
文摘Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and images,named Multi-Modal Frustum PointNet(MMFP).Firstly,MobileNetV3 is used as the backbone network of Mask R-CNN to reduce the network parameters and compress the model volume.The dilated convolutional block attention mechanism(Dilated CBAM)and inception structure are combined with MobileNetV3 to further enhance the detection accuracy.Subsequently,the 2D target candidate box is calculated through the improved Mask R-CNN,and the frustum point cloud in the 2D target candidate box is extracted to reduce the calculation scale and spatial search range.Then,the self-attention PointNet is constructed to segment the fused point cloud within the frustum range,and the bounding box regression network is used to predict the bounding box parameters.Finally,an experimental platform of shearer coal wall cutting is established,and multiple comparative experiments are conducted.Experimental results indicate that the proposed coal-rock recognition method is superior to other advanced models.
基金supported by the Feicheng Artificial Intelligence Robot and Smart Agriculture Service Platform(381387).
文摘Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture.
基金This study was supported by a grand from the National Natural Science Foundation of China(No.12075315).
文摘Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production.
基金supported by the State Grid Science&Technology Project of China(5400-202224153A-1-1-ZN).
文摘Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability.
基金National Natural Science Foundation of China(82274411)Science and Technology Innovation Program of Hunan Province(2022RC1021)Leading Research Project of Hunan University of Chinese Medicine(2022XJJB002).
文摘Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis.
文摘The automated picking technology of tea is an important part of the development of smart agriculture, which affects the development of the tea industry to a certain extent. Tea leaf recognition and robotic tea picking end-effector are the key technologies for automated tea picking. This paper proposes a set of algorithms for tea leaf differentiation and recognition based on the principle of colour difference. And on the basis of this algorithm, a tea picking end-effector is designed. The experiments show that the designed tea picking end-effector has good recognition ability and high tea picking speed.
基金funded by Yunnan Province Local Undergraduate University Basic Research Joint Special Fund Project(No.202101BA070001-016).
文摘Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recognition algorithm based on deep learning network is proposed in this paper.The network consists of four neural layers,including input layer for image preprocessing,convolutional neural networks(CNNs)layer for feature extraction,bidirectional long-short term network(BDLSTM)layer for sequence prediction,and connectionist temporal classification(CTC)layer for text sequence alignment and classification.Besides,a novel data processing method is performed for data length equalization.Based on this,groups of experiments,based on six typical databases,involved in evaluation indicators of character correct rate,training time cost,storage space cost,and testing time cost are carried out.The experimental results show that the proposed algorithm has better performances in accuracy and efficiency than other classical algorithms.
文摘This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system adopts advanced image recognition technology to monitor road conditions in real-time through the camera and to detect and measure distance to foreign objects on the road.The system visualizes the detection results on the wheelchair screen to assist the user in avoiding and improving the safety of their daily travel.In addition,the system also includes crawler tracks,seat heating,snow and rain protection,and other functions.The wheelchair has a wide range of application prospects and development potential.It is expected to be widely used in the future,providing a strong guarantee for the safe travel of disabled individuals in China.
文摘The traditional synthetic aperture radar(SAR) image recognition techniques focus on the electro magnetic (EM) scattering centers, ignoring the important role of the shadow information on the SAR image recognition. It is difficult to classify targets by the shadow information independently, because the shadow shape is dependent on the radar aspect angle, the depression angle and the resolution. Moreover, the shadow shapes of different targets are similar. When the multiple SAR images of one target from different aspects are available, the performance of the target recognition can be improved. Aimed at the problem, a multi-aspect SAR image recognition technique based on the shadow information is developed. It extracts shadow profiles from SAR images, and takes chain codes as the feature vectors of targets. Then, feature vectors on multiple aspects of the same target are combined with feature sequences, and the hidden Markov model (HMM) is applied to the feature sequences for the target recognition. The simulation result shows the effectiveness of the method.
文摘In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.
文摘Clinical gastrointestinal endoscopy has significantly advanced owing to machine learning techniques,which have produced novel instruments and approaches for early-stage disease diagnosis,categorization,and therapy.Machine learning applications in gastrointestinal endoscopy,such as image identification,lesion detection,pathological categorization,and surgical aid,are examined in this minireview.We examine the potential of machine learning to improve treatment regimens,lower misdiagnosis rates,and increase diagnostic accuracy by evaluating previous research.In addition,this study discusses current issues such clinical applicability,model generalization,and data privacy.It also suggests future research directions to help clinicians and researchers in the field of gastrointestinal endoscopy.
基金supported by the Research Resurgence under the Glocal University 30 Project at Gyeongsang National University in 2024.
文摘Recent research on adversarial attacks has primarily focused on white-box attack techniques,with limited exploration of black-box attack methods.Furthermore,in many black-box research scenarios,it is assumed that the output label and probability distribution can be observed without imposing any constraints on the number of attack attempts.Unfortunately,this disregard for the real-world practicality of attacks,particularly their potential for human detectability,has left a gap in the research landscape.Considering these limitations,our study focuses on using a similar color attack method,assuming access only to the output label,limiting the number of attack attempts to 100,and subjecting the attacks to human perceptibility testing.Through this approach,we demonstrated the effectiveness of black box attack techniques in deceiving models and achieved a success rate of 82.68%in deceiving humans.This study emphasizes the significance of research that addresses the challenge of deceiving both humans and models,highlighting the importance of real-world applicability.
基金supported by King Saud University,Riyadh,Saudi Arabia,under Ongoing Research Funding Program(ORF-2025-951).
文摘Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments.
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition.
基金supported by the National Natural Science Foundation of China(Nos.52222504 and 52241502)the Natural Science Talents Foundation of Shaanxi Province(No.2021JC-04).
文摘In order to realize the automatic recognition and classification of cracks with different depths,in this study,several deep convolutional neural networks including AlexNet,ResNet,and DenseNet were employed to identify and classify cracks at different depths and in various materials.An analysis process for the automatic classification of crack damage was presented.The image dataset used for model training was obtained from scanning experiments on aluminum and titanium alloy plates using an ultrasonic phased-array flaw detector.All models were trained and validated with the dataset;the proposed models were compared using classification precision and loss values.The results show that the automatic recognition and classification of crack depth can be realized by using the deep learning algorithm to analyze the ultrasonic phased array images,and the classification precision of DenseNet is the highest.The problem that ultrasonic damage identification relies on manual experience is solved.