期刊文献+
共找到3,624篇文章
< 1 2 182 >
每页显示 20 50 100
Detection and Recognition of Spray Code Numbers on Can Surfaces Based on OCR
1
作者 Hailong Wang Junchao Shi 《Computers, Materials & Continua》 SCIE EI 2025年第1期1109-1128,共20页
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ... A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition. 展开更多
关键词 Can coding recognition differentiable binarization network scene visual text recognition model pruning and quantification transport model
在线阅读 下载PDF
A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition
2
作者 Nayeemul Islam Nayeem Shirin Mahbuba +4 位作者 Sanjida Islam Disha Md Rifat Hossain Buiyan Shakila Rahman M.Abdullah-Al-Wadud Jia Uddin 《Computers, Materials & Continua》 2025年第10期1541-1557,共17页
Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only... Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments. 展开更多
关键词 Human activity recognition YOLOv11 deep learning real-time detection anchor-free detection attention mechanisms object detection image classification multi-class recognition surveillance applications
在线阅读 下载PDF
From ChatGPT to DeepSeek:Potential uses of artificial intelligence in early symptom recognition for stroke care
3
作者 Wai Yan Lam Sunny Chi Lik Au 《Journal of Acute Disease》 2025年第3期13-16,共4页
In the era of artificial intelligence(AI),healthcare and medical sciences are inseparable from different AI technologies[1].ChatGPT once shocked the medical field,but the latest AI model DeepSeek has recently taken th... In the era of artificial intelligence(AI),healthcare and medical sciences are inseparable from different AI technologies[1].ChatGPT once shocked the medical field,but the latest AI model DeepSeek has recently taken the lead[2].PubMed indexed publications on DeepSeek are evolving[3],but limited to editorials and news articles.In this Letter,we explore the use of DeepSeek in early symptoms recognition for stroke care.To the best of our knowledge,this is the first DeepSeek-related writing on stroke. 展开更多
关键词 stroke care indexed publications medical sciences DeepSeek artificial intelligence ai healthcare early symptom recognition artificial intelligence early symptoms recognition
暂未订购
A Comprehensive Review of Pill Image Recognition
4
作者 Linh Nguyen Thi My Viet-Tuan Le +1 位作者 Tham Vo Vinh Truong Hoang 《Computers, Materials & Continua》 2025年第3期3693-3740,共48页
Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensur... Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition. 展开更多
关键词 Pill image recognition pill image identification pill recognition pill identification pill image retrieval pill retrieval computer vision
在线阅读 下载PDF
Functional macrocyclic arenes with active binding sites inside cavity for biomimetic molecular recognition
5
作者 Xixian Sun Shengke Li +1 位作者 Ruibing Wang Leyong Wang 《Chinese Chemical Letters》 2025年第4期1-2,共2页
Molecular recognition of bioreceptors and enzymes relies on orthogonal interactions with small molecules within their cavity. To date, Chinese scientists have developed three types of strategies for introducing active... Molecular recognition of bioreceptors and enzymes relies on orthogonal interactions with small molecules within their cavity. To date, Chinese scientists have developed three types of strategies for introducing active sites inside the cavity of macrocyclic arenes to better mimic molecular recognition of bioreceptors and enzymes.The editorial aims to enlighten scientists in this field when they develop novel macrocycles for molecular recognition, supramolecular assembly, and applications. 展开更多
关键词 supramolecular assembly orthogonal interactions introducing active sites active binding sites macrocyclic arenes molecular recognition orthogonal interactions small molecules biomimetic molecular recognition
原文传递
A deep learning lightweight model for real-time captive macaque facial recognition based on an improved YOLOX model
6
作者 Jia-Jin Zhang Yu Gao +1 位作者 Bao-Lin Zhang Dong-Dong Wu 《Zoological Research》 2025年第2期339-354,共16页
Automated behavior monitoring of macaques offers transformative potential for advancing biomedical research and animal welfare.However,reliably identifying individual macaques in group environments remains a significa... Automated behavior monitoring of macaques offers transformative potential for advancing biomedical research and animal welfare.However,reliably identifying individual macaques in group environments remains a significant challenge.This study introduces ACE-YOLOX,a lightweight facial recognition model tailored for captive macaques.ACE-YOLOX incorporates Efficient Channel Attention(ECA),Complete Intersection over Union loss(CIoU),and Adaptive Spatial Feature Fusion(ASFF)into the YOLOX framework,enhancing prediction accuracy while reducing computational complexity.These integrated approaches enable effective multiscale feature extraction.Using a dataset comprising 179400 labeled facial images from 1196 macaques,ACE-YOLOX surpassed the performance of classical object detection models,demonstrating superior accuracy and real-time processing capabilities.An Android application was also developed to deploy ACE-YOLOX on smartphones,enabling on-device,real-time macaque recognition.Our experimental results highlight the potential of ACE-YOLOX as a non-invasive identification tool,offering an important foundation for future studies in macaque facial expression recognition,cognitive psychology,and social behavior. 展开更多
关键词 YOLOX MACAQUE Facial recognition Identity recognition Animal welfare
在线阅读 下载PDF
Dynamic behavior recognition in aerial deployment of multi-segmented foldable-wing drones using variational autoencoders
7
作者 Yilin DOU Zhou ZHOU Rui WANG 《Chinese Journal of Aeronautics》 2025年第6期143-165,共23页
The aerial deployment method enables Unmanned Aerial Vehicles(UAVs)to be directly positioned at the required altitude for their mission.This method typically employs folding technology to improve loading efficiency,wi... The aerial deployment method enables Unmanned Aerial Vehicles(UAVs)to be directly positioned at the required altitude for their mission.This method typically employs folding technology to improve loading efficiency,with applications such as the gravity-only aerial deployment of high-aspect-ratio solar-powered UAVs,and aerial takeoff of fixed-wing drones in Mars research.However,the significant morphological changes during deployment are accompanied by strong nonlinear dynamic aerodynamic forces,which result in multiple degrees of freedom and an unstable character.This hinders the description and analysis of unknown dynamic behaviors,further leading to difficulties in the design of deployment strategies and flight control.To address this issue,this paper proposes an analysis method for dynamic behaviors during aerial deployment based on the Variational Autoencoder(VAE).Focusing on the gravity-only deployment problem of highaspect-ratio foldable-wing UAVs,the method encodes the multi-degree-of-freedom unstable motion signals into a low-dimensional feature space through a data-driven approach.By clustering in the feature space,this paper identifies and studies several dynamic behaviors during aerial deployment.The research presented in this paper offers a new method and perspective for feature extraction and analysis of complex and difficult-to-describe extreme flight dynamics,guiding the research on aerial deployment drones design and control strategies. 展开更多
关键词 Dynamic behavior recognition Aerial deployment technology Variational autoencoder Pattern recognition Multi-rigid-bodydynamics
原文传递
A Compact Manifold Mixup Feature-Based Open-Set Recognition Approach for Unknown Signals
8
作者 Yang Ying Zhu Lidong +1 位作者 Li Chengjie Sun Hong 《China Communications》 2025年第4期322-338,共17页
There are all kinds of unknown and known signals in the actual electromagnetic environment,which hinders the development of practical cognitive radio applications.However,most existing signal recognition models are di... There are all kinds of unknown and known signals in the actual electromagnetic environment,which hinders the development of practical cognitive radio applications.However,most existing signal recognition models are difficult to discover unknown signals while recognizing known ones.In this paper,a compact manifold mixup feature-based open-set recognition approach(OR-CMMF)is proposed to address the above problem.First,the proposed approach utilizes the center loss to constrain decision boundaries so that it obtains the compact latent signal feature representations and extends the low-confidence feature space.Second,the latent signal feature representations are used to construct synthetic representations as substitutes for unknown categories of signals.Then,these constructed representations can occupy the extended low-confidence space.Finally,the proposed approach applies the distillation loss to adjust the decision boundaries between the known categories signals and the constructed unknown categories substitutes so that it accurately discovers unknown signals.The OR-CMMF approach outperformed other state-of-the-art open-set recognition methods in comprehensive recognition performance and running time,as demonstrated by simulation experiments on two public datasets RML2016.10a and ORACLE. 展开更多
关键词 manifold mixup open-set recognition synthetic representation unknown signal recognition
在线阅读 下载PDF
Correction to DeepCNN:Spectro-temporal feature representation for speech emotion recognition
9
《CAAI Transactions on Intelligence Technology》 2025年第2期633-633,共1页
Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf shou... Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf should be[Independent Researcher,UK]. 展开更多
关键词 independent researcher speech emotion recognition deep cnn uk speech emotion recognitioncaai spectro temporal feature representation hafiz tayyab rauf
在线阅读 下载PDF
A Comprehensive Review of Face Detection/Recognition Algorithms and Competitive Datasets to Optimize Machine Vision
10
作者 Mahmood Ul Haq Muhammad Athar Javed Sethi +3 位作者 Sadique Ahmad Naveed Ahmad Muhammad Shahid Anwar Alpamis Kutlimuratov 《Computers, Materials & Continua》 2025年第7期1-24,共24页
Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensi... Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research. 展开更多
关键词 Face recognition algorithms face detection techniques face recognition/detection datasets
在线阅读 下载PDF
Evolutionary neural architecture search for traffic sign recognition
11
作者 SONG Changwei MA Yongjie +1 位作者 PING Haoyu SUN Lisheng 《Optoelectronics Letters》 2025年第7期434-440,共7页
Convolutional neural networks(CNNs)exhibit superior performance in image feature extraction,making them extensively used in the area of traffic sign recognition.However,the design of existing traffic sign recognition ... Convolutional neural networks(CNNs)exhibit superior performance in image feature extraction,making them extensively used in the area of traffic sign recognition.However,the design of existing traffic sign recognition algorithms often relies on expert knowledge to enhance the image feature extraction networks,necessitating image preprocessing and model parameter tuning.This increases the complexity of the model design process.This study introduces an evolutionary neural architecture search(ENAS)algorithm for the automatic design of neural network models tailored for traffic sign recognition.By integrating the construction parameters of residual network(ResNet)into evolutionary algorithms(EAs),we automatically generate lightweight networks for traffic sign recognition,utilizing blocks as the fundamental building units.Experimental evaluations on the German traffic sign recognition benchmark(GTSRB)dataset reveal that the algorithm attains a recognition accuracy of 99.32%,with a mere 2.8×10^(6)parameters.Experimental results comparing the proposed method with other traffic sign recognition algorithms demonstrate that the method can more efficiently discover neural network architectures,significantly reducing the number of network parameters while maintaining recognition accuracy. 展开更多
关键词 traffic sign recognitionhoweverthe expert knowledge image feature extraction model parameter tuningthis evolutionary neural architecture search enas algorithm traffic sign recognition model design image preprocessing
原文传递
IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare 被引量:1
12
作者 Subrata Kumer Paul Abu Saleh Musa Miah +3 位作者 Rakhi Rani Paul Md.EkramulHamid Jungpil Shin Md Abdur Rahim 《Computers, Materials & Continua》 2025年第8期2513-2530,共18页
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he... The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs. 展开更多
关键词 Real-time human motion recognition(HMR) ENConvLSTM EfficientNet ConvLSTM skeleton data NTU RGB+D 120 dataset MRHA
在线阅读 下载PDF
Comprehensive Review and Analysis on Facial Emotion Recognition:Performance Insights into Deep and Traditional Learning with Current Updates and Challenges
13
作者 Amjad Rehman Muhammad Mujahid +2 位作者 Alex Elyassih Bayan AlGhofaily Saeed Ali Omer Bahaj 《Computers, Materials & Continua》 SCIE EI 2025年第1期41-72,共32页
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi... In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research. 展开更多
关键词 Face emotion recognition deep learning hybrid learning CK+ facial images machine learning technological development
在线阅读 下载PDF
Multi-Stage-Based Siamese Neural Network for Seal Image Recognition
14
作者 Jianfeng Lu Xiangye Huang +3 位作者 Caijin Li Renlin Xin Shanqing Zhang Mahmoud Emam 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期405-423,共19页
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited... Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets. 展开更多
关键词 Seal recognition seal authentication document tampering siamese network spatial transformer network similarity comparison network
在线阅读 下载PDF
Occluded Gait Emotion Recognition Based on Multi-Scale Suppression Graph Convolutional Network
15
作者 Yuxiang Zou Ning He +2 位作者 Jiwu Sun Xunrui Huang Wenhua Wang 《Computers, Materials & Continua》 SCIE EI 2025年第1期1255-1276,共22页
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac... In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods. 展开更多
关键词 KNN interpolation multi-scale temporal convolution suppression graph convolutional network gait emotion recognition human skeleton
在线阅读 下载PDF
IDSSCNN-XgBoost:Improved Dual-Stream Shallow Convolutional Neural Network Based on Extreme Gradient Boosting Algorithm for Micro Expression Recognition
16
作者 Adnan Ahmad Zhao Li +1 位作者 Irfan Tariq Zhengran He 《Computers, Materials & Continua》 SCIE EI 2025年第1期729-749,共21页
Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been pr... Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time. 展开更多
关键词 ME recognition dual stream shallow convolutional neural network euler video magnification TV-L1 XgBoost
在线阅读 下载PDF
ARNet:Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos
17
作者 Hussain Dawood Marriam Nawaz +3 位作者 Tahira Nazir Ali Javed Abdul Khader Jilani Saudagar Hatoon S.AlSagri 《Computer Modeling in Engineering & Sciences》 2025年第7期429-459,共31页
Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have b... Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have been designed for this purpose;however,existing methods often struggle to effectively integrate spatial and temporal information from input samples such as 2-stream networks or 3D convolutional neural networks(CNNs),which limits their accuracy in discriminating numerous human actions.Therefore,this study introduces a novel deeplearning framework called theARNet,designed for robustHAR.ARNet consists of two mainmodules,namely,a refined InceptionResNet-V2-based CNN and a Bi-LSTM(Long Short-Term Memory)network.The refined InceptionResNet-V2 employs a parametric rectified linear unit(PReLU)activation strategy within convolutional layers to enhance spatial feature extraction fromindividual video frames.The inclusion of the PReLUmethod improves the spatial informationcapturing ability of the approach as it uses learnable parameters to adaptively control the slope of the negative part of the activation function,allowing richer gradient flow during backpropagation and resulting in robust information capturing and stable model training.These spatial features holding essential pixel characteristics are then processed by the Bi-LSTMmodule for temporal analysis,which assists the ARNet in understanding the dynamic behavior of actions over time.The ARNet integrates three additional dense layers after the Bi-LSTM module to ensure a comprehensive computation of both spatial and temporal patterns and further boost the feature representation.The experimental validation of the model is conducted on 3 benchmark datasets named HMDB51,KTH,and UCF Sports and reports accuracies of 93.82%,99%,and 99.16%,respectively.The Precision results of HMDB51,KTH,and UCF Sports datasets are 97.41%,99.54%,and 99.01%;the Recall values are 98.87%,98.60%,99.08%,and the F1-Score is 98.13%,99.07%,99.04%,respectively.These results highlight the robustness of the ARNet approach and its potential as a versatile tool for accurate HAR across various real-world applications. 展开更多
关键词 Action recognition Bi-LSTM computer vision deep learning InceptionResNet-V2 PReLU
在线阅读 下载PDF
A Novel Attention-Based Parallel Blocks Deep Architecture for Human Action Recognition
18
作者 Yasir Khan Jadoon Yasir Noman Khalid +4 位作者 Muhammad Attique Khan Jungpil Shin Fatimah Alhayan Hee-Chan Cho Byoungchol Chang 《Computer Modeling in Engineering & Sciences》 2025年第7期1143-1164,共22页
Real-time surveillance is attributed to recognizing the variety of actions performed by humans.Human Action Recognition(HAR)is a technique that recognizes human actions from a video stream.A range of variations in hum... Real-time surveillance is attributed to recognizing the variety of actions performed by humans.Human Action Recognition(HAR)is a technique that recognizes human actions from a video stream.A range of variations in human actions makes it difficult to recognize with considerable accuracy.This paper presents a novel deep neural network architecture called Attention RB-Net for HAR using video frames.The input is provided to the model in the form of video frames.The proposed deep architecture is based on the unique structuring of residual blocks with several filter sizes.Features are extracted from each frame via several operations with specific parameters defined in the presented novel Attention-based Residual Bottleneck(Attention-RB)DCNN architecture.A fully connected layer receives an attention-based features matrix,and final classification is performed.Several hyperparameters of the proposed model are initialized using Bayesian Optimization(BO)and later utilized in the trained model for testing.In testing,features are extracted from the self-attention layer and passed to neural network classifiers for the final action classification.Two highly cited datasets,HMDB51 and UCF101,were used to validate the proposed architecture and obtained an average accuracy of 87.70%and 97.30%,respectively.The deep convolutional neural network(DCNN)architecture is compared with state-of-the-art(SOTA)methods,including pre-trained models,inside blocks,and recently published techniques,and performs better. 展开更多
关键词 Human action recognition self-attention video streams residual bottleneck classification neural networks
在线阅读 下载PDF
Dual-Task Contrastive Meta-Learning for Few-Shot Cross-Domain Emotion Recognition
19
作者 Yujiao Tang Yadong Wu +2 位作者 Yuanmei He Jilin Liu Weihan Zhang 《Computers, Materials & Continua》 2025年第2期2331-2352,共22页
Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion... Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion recognition approaches often struggle in few-shot cross-domain scenarios due to their limited capacity to generalize semantic features across different domains. Additionally, these methods face challenges in accurately capturing complex emotional states, particularly those that are subtle or implicit. To overcome these limitations, we introduce a novel approach called Dual-Task Contrastive Meta-Learning (DTCML). This method combines meta-learning and contrastive learning to improve emotion recognition. Meta-learning enhances the model’s ability to generalize to new emotional tasks, while instance contrastive learning further refines the model by distinguishing unique features within each category, enabling it to better differentiate complex emotional expressions. Prototype contrastive learning, in turn, helps the model address the semantic complexity of emotions across different domains, enabling the model to learn fine-grained emotions expression. By leveraging dual tasks, DTCML learns from two domains simultaneously, the model is encouraged to learn more diverse and generalizable emotions features, thereby improving its cross-domain adaptability and robustness, and enhancing its generalization ability. We evaluated the performance of DTCML across four cross-domain settings, and the results show that our method outperforms the best baseline by 5.88%, 12.04%, 8.49%, and 8.40% in terms of accuracy. 展开更多
关键词 Contrastive learning emotion recognition cross-domain learning DUAL-TASK META-LEARNING
在线阅读 下载PDF
EFI-SATL:An Efficient Net and Self-Attention Based Biometric Recognition for Finger-Vein Using Deep Transfer Learning
20
作者 Manjit Singh Sunil Kumar Singla 《Computer Modeling in Engineering & Sciences》 2025年第3期3003-3029,共27页
Deep Learning-based systems for Finger vein recognition have gained rising attention in recent years due to improved efficiency and enhanced security.The performance of existing CNN-based methods is limited by the pun... Deep Learning-based systems for Finger vein recognition have gained rising attention in recent years due to improved efficiency and enhanced security.The performance of existing CNN-based methods is limited by the puny generalization of learned features and deficiency of the finger vein image training data.Considering the concerns of existing methods,in this work,a simplified deep transfer learning-based framework for finger-vein recognition is developed using an EfficientNet model of deep learning with a self-attention mechanism.Data augmentation using various geometrical methods is employed to address the problem of training data shortage required for a deep learning model.The proposed model is tested using K-fold cross-validation on three publicly available datasets:HKPU,FVUSM,and SDUMLA.Also,the developed network is compared with other modern deep nets to check its effectiveness.In addition,a comparison of the proposed method with other existing Finger vein recognition(FVR)methods is also done.The experimental results exhibited superior recognition accuracy of the proposed method compared to other existing methods.In addition,the developed method proves to be more effective and less sophisticated at extracting robust features.The proposed EffAttenNet achieves an accuracy of 98.14%on HKPU,99.03%on FVUSM,and 99.50%on SDUMLA databases. 展开更多
关键词 Biometrics finger-vein recognition(FVR) deep net self-attention Efficient Nets transfer learning
在线阅读 下载PDF
上一页 1 2 182 下一页 到第
使用帮助 返回顶部