期刊文献+
共找到13,003篇文章
< 1 2 250 >
每页显示 20 50 100
A Comprehensive Review of Pill Image Recognition
1
作者 Linh Nguyen Thi My Viet-Tuan Le +1 位作者 Tham Vo Vinh Truong Hoang 《Computers, Materials & Continua》 2025年第3期3693-3740,共48页
Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensur... Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition. 展开更多
关键词 Pill image recognition pill image identification pill recognition pill identification pill image retrieval pill retrieval computer vision
在线阅读 下载PDF
Research on the balance optimization algorithm of image recognition accuracy and speed based on autocollimator measurement
2
作者 LI Renpu MA Long +3 位作者 CUI Jiwen GUO Junqi Andrei KULIKOV WEN Dandan 《Optoelectronics Letters》 2025年第2期121-128,共8页
The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging... The autocollimator is an important device for achieving precise,small-angle,non-contact measurements.It primarily obtains angular parameters of a plane target mirror indirectly by detecting the position of the imaging spot.There is limited report on the core algorithmic techniques in current commercial products and recent scientific research.This paper addresses the performance requirements of coordinate reading accuracy and operational speed in autocollimator image positioning.It proposes a cross-image center recognition scheme based on the Hough transform and another based on Zernike moments and the least squares method.Through experimental evaluation of the accuracy and speed of both schemes,the optimal image recognition scheme balancing measurement accuracy and speed for the autocollimator is determined.Among these,the center recognition method based on Zernike moments and the least squares method offers higher measurement accuracy and stability,while the Hough transform-based method provides faster measurement speed. 展开更多
关键词 image optimization recognition
原文传递
A teacher-student based attention network for fine-grainedimage recognition
3
作者 Ang Li Xueyi Zhang +1 位作者 Peilin Li Bin Kang 《Digital Communications and Networks》 2025年第1期52-59,共8页
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin... Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework. 展开更多
关键词 Fine-grained image recognition Collaborative teacher-student strategy Multi-attention Global attention
在线阅读 下载PDF
Transformer-Based Fusion of Infrared and Visible Imagery for Smoke Recognition in Commercial Areas
4
作者 Chongyang Wang Qiongyan Li +2 位作者 Shu Liu Pengle Cheng Ying Huang 《Computers, Materials & Continua》 2025年第9期5157-5176,共20页
With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations... With rapid urbanization,fires pose significant challenges in urban governance.Traditional fire detection methods often struggle to detect smoke in complex urban scenes due to environmental interferences and variations in viewing angles.This study proposes a novel multimodal smoke detection method that fuses infrared and visible imagery using a transformer-based deep learning model.By capturing both thermal and visual cues,our approach significantly enhances the accuracy and robustness of smoke detection in business parks scenes.We first established a dual-view dataset comprising infrared and visible light videos,implemented an innovative image feature fusion strategy,and designed a deep learning model based on the transformer architecture and attention mechanism for smoke classification.Experimental results demonstrate that our method outperforms existing methods,under the condition of multi-view input,it achieves an accuracy rate of 90.88%,precision rate of 98.38%,recall rate of 92.41%and false positive and false negative rates both below 5%,underlining the effectiveness of the proposed multimodal and multi-view fusion approach.The attention mechanism plays a crucial role in improving detection performance,particularly in identifying subtle smoke features. 展开更多
关键词 Multimodal image processing smoke recognition urban safety environmental monitoring
在线阅读 下载PDF
Fusion method for water depth data from multiple sources based on image recognition
5
作者 Huiyu HAN Feng ZHOU 《Journal of Oceanology and Limnology》 2025年第4期1093-1105,共13页
Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(... Considering the difficulty of integrating the depth points of nautical charts of the East China Sea into a global high-precision Grid Digital Elevation Model(Grid-DEM),we proposed a“Fusion based on Image Recognition(FIR)”method for multi-sourced depth data fusion,and used it to merge the electronic nautical chart dataset(referred to as Chart2014 in this paper)with the global digital elevation dataset(referred to as Globalbath2002 in this paper).Compared to the traditional fusion of two datasets by direct combination and interpolation,the new Grid-DEM formed by FIR can better represent the data characteristics of Chart2014,reduce the calculation difficulty,and be more intuitive,and,the choice of different interpolation methods in FIR and the influence of the“exclusion radius R”parameter were discussed.FIR avoids complex calculations of spatial distances among points from different sources,and instead uses spatial exclusion map to perform one-step screening based on the exclusion radius R,which greatly improved the fusion status of a reliable dataset.The fusion results of different experiments were analyzed statistically with root mean square error and mean relative error,showing that the interpolation methods based on Delaunay triangulation are more suitable for the fusion of nautical chart depth of China,and factors such as the point density distribution of multiple source data,accuracy,interpolation method,and various terrain conditions should be fully considered when selecting the exclusion radius R. 展开更多
关键词 water depth fusion method Grid Digital Elevation Model(Grid-DEM) image recognition Delaunay triangulation
在线阅读 下载PDF
A novel coal-rock recognition method in coal mining face based on fusing laser point cloud and images
6
作者 Yang Liu Lei Si +4 位作者 Zhongbin Wang Miao Chen Xin Li Dong Wei Jinheng Gu 《International Journal of Mining Science and Technology》 2025年第7期1057-1071,共15页
Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and imag... Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and images,named Multi-Modal Frustum PointNet(MMFP).Firstly,MobileNetV3 is used as the backbone network of Mask R-CNN to reduce the network parameters and compress the model volume.The dilated convolutional block attention mechanism(Dilated CBAM)and inception structure are combined with MobileNetV3 to further enhance the detection accuracy.Subsequently,the 2D target candidate box is calculated through the improved Mask R-CNN,and the frustum point cloud in the 2D target candidate box is extracted to reduce the calculation scale and spatial search range.Then,the self-attention PointNet is constructed to segment the fused point cloud within the frustum range,and the bounding box regression network is used to predict the bounding box parameters.Finally,an experimental platform of shearer coal wall cutting is established,and multiple comparative experiments are conducted.Experimental results indicate that the proposed coal-rock recognition method is superior to other advanced models. 展开更多
关键词 Coal miningface Coal-rock recognition Deep learning Laser pointcloud and images fusion Multi-Modal Frustum PointNet(MMFP)
在线阅读 下载PDF
Phenotypic Image Recognition of Asparagus Stem Blight Based on Improved YOLOv8
7
作者 Shunshun Ji Jiajun Sun Chao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4017-4029,共13页
Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus ... Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture. 展开更多
关键词 YOLOv8 asparagus stem blight image recognition PEST
在线阅读 下载PDF
Squeeze and Excitation Convolution with Shortcut for Complex Plasma Image Recognition
8
作者 Baoxia Li Wenzhuo Chen +5 位作者 Xiaojiang Tang Shaohuang Bian Yang Liu Junwei Guo Dan Zhang Feng Huang 《Computers, Materials & Continua》 SCIE EI 2024年第8期2221-2236,共16页
Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of part... Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production. 展开更多
关键词 image recognition complex plasmas deep learning
在线阅读 下载PDF
Modeling load distribution for rural photovoltaic grid areas using image recognition
9
作者 Ning Zhou Bowen Shang +1 位作者 Jinshuai Zhang Mingming Xu 《Global Energy Interconnection》 EI CSCD 2024年第3期270-283,共14页
Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,ru... Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability. 展开更多
关键词 Deep learning Remote sensing image recognition Photovoltaic development Load distribution modeling Power flow calculation
在线阅读 下载PDF
Deep learning-based recognition of stained tongue coating images
10
作者 ZHONG Liqin XIN Guojiang +3 位作者 PENG Qinghua CUI Ji ZHU Lei LIANG Hao 《Digital Chinese Medicine》 CAS CSCD 2024年第2期129-136,共8页
Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of s... Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis. 展开更多
关键词 Deep learning Tongue coating Stained coating image recognition Traditional Chinese medicine(TCM) Intelligent diagnosis
在线阅读 下载PDF
Tea Image Recognition and Research on Structure of Tea Picking End-Effector
11
作者 Biao Huang Shiping Zou 《Journal of Electronics Cooling and Thermal Control》 2024年第3期51-60,共10页
The automated picking technology of tea is an important part of the development of smart agriculture, which affects the development of the tea industry to a certain extent. Tea leaf recognition and robotic tea picking... The automated picking technology of tea is an important part of the development of smart agriculture, which affects the development of the tea industry to a certain extent. Tea leaf recognition and robotic tea picking end-effector are the key technologies for automated tea picking. This paper proposes a set of algorithms for tea leaf differentiation and recognition based on the principle of colour difference. And on the basis of this algorithm, a tea picking end-effector is designed. The experiments show that the designed tea picking end-effector has good recognition ability and high tea picking speed. 展开更多
关键词 image recognition of Tea Leaves Tea Picking End-Effector Tea PickingStructure Design
在线阅读 下载PDF
Segmentation-Free Recognition Algorithm Based on Deep Learning for Handwritten Text Image
12
作者 Ge Peng 《Journal of Artificial Intelligence and Technology》 2024年第2期169-178,共10页
Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recogni... Segmentation-based offline handwritten character recognition algorithms suffered from the segmenting difficulty of interleaving and touching in handwritten manuscripts.To tackle the problem,a segmentation-free recognition algorithm based on deep learning network is proposed in this paper.The network consists of four neural layers,including input layer for image preprocessing,convolutional neural networks(CNNs)layer for feature extraction,bidirectional long-short term network(BDLSTM)layer for sequence prediction,and connectionist temporal classification(CTC)layer for text sequence alignment and classification.Besides,a novel data processing method is performed for data length equalization.Based on this,groups of experiments,based on six typical databases,involved in evaluation indicators of character correct rate,training time cost,storage space cost,and testing time cost are carried out.The experimental results show that the proposed algorithm has better performances in accuracy and efficiency than other classical algorithms. 展开更多
关键词 deep learning image processing segmentation-free handwritten image recognition sequence labeling
在线阅读 下载PDF
Intelligent Assisted Travel Wheelchair Based on Image Recognition Technology
13
作者 Shuai Li 《Journal of Electronic Research and Application》 2024年第5期154-160,共7页
This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system ad... This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system adopts advanced image recognition technology to monitor road conditions in real-time through the camera and to detect and measure distance to foreign objects on the road.The system visualizes the detection results on the wheelchair screen to assist the user in avoiding and improving the safety of their daily travel.In addition,the system also includes crawler tracks,seat heating,snow and rain protection,and other functions.The wheelchair has a wide range of application prospects and development potential.It is expected to be widely used in the future,providing a strong guarantee for the safe travel of disabled individuals in China. 展开更多
关键词 image recognition Traffic safety Travel security
在线阅读 下载PDF
SAR IMAGE RECOGNITION BASED ON MULTI-ASPECT OF SHADOW INFORMATION 被引量:2
14
作者 杨露菁 郝威 王德石 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2009年第4期320-326,共7页
The traditional synthetic aperture radar(SAR) image recognition techniques focus on the electro magnetic (EM) scattering centers, ignoring the important role of the shadow information on the SAR image recognition.... The traditional synthetic aperture radar(SAR) image recognition techniques focus on the electro magnetic (EM) scattering centers, ignoring the important role of the shadow information on the SAR image recognition. It is difficult to classify targets by the shadow information independently, because the shadow shape is dependent on the radar aspect angle, the depression angle and the resolution. Moreover, the shadow shapes of different targets are similar. When the multiple SAR images of one target from different aspects are available, the performance of the target recognition can be improved. Aimed at the problem, a multi-aspect SAR image recognition technique based on the shadow information is developed. It extracts shadow profiles from SAR images, and takes chain codes as the feature vectors of targets. Then, feature vectors on multiple aspects of the same target are combined with feature sequences, and the hidden Markov model (HMM) is applied to the feature sequences for the target recognition. The simulation result shows the effectiveness of the method. 展开更多
关键词 image recognition synthetic aperture radar (SAR) shadow information chain code
在线阅读 下载PDF
Comprehensive Review and Analysis on Facial Emotion Recognition:Performance Insights into Deep and Traditional Learning with Current Updates and Challenges
15
作者 Amjad Rehman Muhammad Mujahid +2 位作者 Alex Elyassih Bayan AlGhofaily Saeed Ali Omer Bahaj 《Computers, Materials & Continua》 SCIE EI 2025年第1期41-72,共32页
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi... In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research. 展开更多
关键词 Face emotion recognition deep learning hybrid learning CK+ facial images machine learning technological development
在线阅读 下载PDF
Endoscopic image analysis assisted by machine learning:Algorithmic advancements and clinical uses
16
作者 Jiang-Cheng Ding Jun Zhang 《Artificial Intelligence in Gastrointestinal Endoscopy》 2025年第3期1-9,共9页
Clinical gastrointestinal endoscopy has significantly advanced owing to machine learning techniques,which have produced novel instruments and approaches for early-stage disease diagnosis,categorization,and therapy.Mac... Clinical gastrointestinal endoscopy has significantly advanced owing to machine learning techniques,which have produced novel instruments and approaches for early-stage disease diagnosis,categorization,and therapy.Machine learning applications in gastrointestinal endoscopy,such as image identification,lesion detection,pathological categorization,and surgical aid,are examined in this minireview.We examine the potential of machine learning to improve treatment regimens,lower misdiagnosis rates,and increase diagnostic accuracy by evaluating previous research.In addition,this study discusses current issues such clinical applicability,model generalization,and data privacy.It also suggests future research directions to help clinicians and researchers in the field of gastrointestinal endoscopy. 展开更多
关键词 Machine learning Artificial intelligence ENDOSCOPY image recognition GASTROENTEROLOGY
暂未订购
Practical Adversarial Attacks Imperceptible to Humans in Visual Recognition
17
作者 Donghyeok Park Sumin Yeon +2 位作者 Hyeon Seo Seok-Jun Buu Suwon Lee 《Computer Modeling in Engineering & Sciences》 2025年第3期2725-2737,共13页
Recent research on adversarial attacks has primarily focused on white-box attack techniques,with limited exploration of black-box attack methods.Furthermore,in many black-box research scenarios,it is assumed that the ... Recent research on adversarial attacks has primarily focused on white-box attack techniques,with limited exploration of black-box attack methods.Furthermore,in many black-box research scenarios,it is assumed that the output label and probability distribution can be observed without imposing any constraints on the number of attack attempts.Unfortunately,this disregard for the real-world practicality of attacks,particularly their potential for human detectability,has left a gap in the research landscape.Considering these limitations,our study focuses on using a similar color attack method,assuming access only to the output label,limiting the number of attack attempts to 100,and subjecting the attacks to human perceptibility testing.Through this approach,we demonstrated the effectiveness of black box attack techniques in deceiving models and achieved a success rate of 82.68%in deceiving humans.This study emphasizes the significance of research that addresses the challenge of deceiving both humans and models,highlighting the importance of real-world applicability. 展开更多
关键词 Adversarial attacks image recognition information security
在线阅读 下载PDF
A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition
18
作者 Nayeemul Islam Nayeem Shirin Mahbuba +4 位作者 Sanjida Islam Disha Md Rifat Hossain Buiyan Shakila Rahman M.Abdullah-Al-Wadud Jia Uddin 《Computers, Materials & Continua》 2025年第10期1541-1557,共17页
Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only... Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments. 展开更多
关键词 Human activity recognition YOLOv11 deep learning real-time detection anchor-free detection attention mechanisms object detection image classification multi-class recognition surveillance applications
在线阅读 下载PDF
A Dual-Layer Attention Based CAPTCHA Recognition Approach with Guided Visual Attention
19
作者 Zaid Derea Beiji Zou +2 位作者 Xiaoyan Kui Alaa Thobhani Amr Abdussalam 《Computer Modeling in Engineering & Sciences》 2025年第3期2841-2867,共27页
Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.Whil... Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition. 展开更多
关键词 Text-based CAPTCHA image recognition guided visual attention web security computer vision
在线阅读 下载PDF
Deep Learning-Based Identification of Cracks Using Ultrasonic Phased-Array Images
20
作者 Lijuan Yang Huan Liu +3 位作者 Desheng Wu Zhibo Yang Xuefeng Chen Shaohua Tian 《Acta Mechanica Solida Sinica》 2025年第5期803-814,共12页
In order to realize the automatic recognition and classification of cracks with different depths,in this study,several deep convolutional neural networks including AlexNet,ResNet,and DenseNet were employed to identify... In order to realize the automatic recognition and classification of cracks with different depths,in this study,several deep convolutional neural networks including AlexNet,ResNet,and DenseNet were employed to identify and classify cracks at different depths and in various materials.An analysis process for the automatic classification of crack damage was presented.The image dataset used for model training was obtained from scanning experiments on aluminum and titanium alloy plates using an ultrasonic phased-array flaw detector.All models were trained and validated with the dataset;the proposed models were compared using classification precision and loss values.The results show that the automatic recognition and classification of crack depth can be realized by using the deep learning algorithm to analyze the ultrasonic phased array images,and the classification precision of DenseNet is the highest.The problem that ultrasonic damage identification relies on manual experience is solved. 展开更多
关键词 Crack damage Deep convolutional neural network Ultrasonic phased-array image Automatic crack recognition
原文传递
上一页 1 2 250 下一页 到第
使用帮助 返回顶部