期刊文献+
共找到85,550篇文章
< 1 2 250 >
每页显示 20 50 100
A Fine-Grained RecognitionModel based on Discriminative Region Localization and Efficient Second-Order Feature Encoding
1
作者 Xiaorui Zhang Yingying Wang +3 位作者 Wei Sun Shiyu Zhou Haoming Zhang Pengpai Wang 《Computers, Materials & Continua》 2026年第4期946-965,共20页
Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in comp... Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively. 展开更多
关键词 fine-grained recognition feature encoding data augmentation second-order feature discriminative regions
在线阅读 下载PDF
Research on Fine-Grained Recognition Method for Sensitive Information in Social Networks Based on CLIP
2
作者 Menghan Zhang Fangfang Shan +1 位作者 Mengyao Liu Zhenyu Wang 《Computers, Materials & Continua》 SCIE EI 2024年第10期1565-1580,共16页
With the emergence and development of social networks,people can stay in touch with friends,family,and colleagues more quickly and conveniently,regardless of their location.This ubiquitous digital internet environment... With the emergence and development of social networks,people can stay in touch with friends,family,and colleagues more quickly and conveniently,regardless of their location.This ubiquitous digital internet environment has also led to large-scale disclosure of personal privacy.Due to the complexity and subtlety of sensitive information,traditional sensitive information identification technologies cannot thoroughly address the characteristics of each piece of data,thus weakening the deep connections between text and images.In this context,this paper adopts the CLIP model as a modality discriminator.By using comparative learning between sensitive image descriptions and images,the similarity between the images and the sensitive descriptions is obtained to determine whether the images contain sensitive information.This provides the basis for identifying sensitive information using different modalities.Specifically,if the original data does not contain sensitive information,only single-modality text-sensitive information identification is performed;if the original data contains sensitive information,multimodality sensitive information identification is conducted.This approach allows for differentiated processing of each piece of data,thereby achieving more accurate sensitive information identification.The aforementioned modality discriminator can address the limitations of existing sensitive information identification technologies,making the identification of sensitive information from the original data more appropriate and precise. 展开更多
关键词 Deep learning social networks sensitive information recognition multi-modal fusion
在线阅读 下载PDF
YOLO-Drive:Robust Driver Distraction Recognition under Fine-Grained and Overlapping Behaviors
3
作者 Zhichao Yu Jiahui Yu +1 位作者 Simon James Fong Yaoyang Wu 《Computers, Materials & Continua》 2026年第5期621-638,共18页
Accurately recognizing driver distraction is critical for preventing traffic accidents,yet current detection models face two persistent challenges.First,distractions are often fine-grained,involving subtle cues such a... Accurately recognizing driver distraction is critical for preventing traffic accidents,yet current detection models face two persistent challenges.First,distractions are often fine-grained,involving subtle cues such as brief eye closures or partial yawns,which are easily missed by conventional detectors.Second,in real-world scenarios,drivers frequently exhibit overlapping behaviors,such as simultaneously holding a cup,closing their eyes,and yawning,leading tomultiple detection boxes and degradedmodel performance.Existing approaches fail to robustly address these complexities,resulting in limited reliability in safety critical applications.To overcome these pain points,we propose YOLO-Drive,a novel framework that enhances YOLO-based driver monitoring with EfficientViM and Polarized Spectral–Spatial Attention(PSSA)modules.Efficient ViMprovides lightweight yet powerful global–local feature extraction,enabling accurate recognition of subtle driver states.PSSA further amplifies discriminative features across spatial and spectral domains,ensuring robust separation of concurrent distraction cues.By explicitly modeling fine-grained and overlapping behaviors,our approach delivers significant improvements in both precision and robustness.Extensive experiments on benchmark driver distraction datasets demonstrate that YOLO-Drive consistently out-performs stateof-the-art models,achieving higher detection accuracy while maintaining real-time efficiency.These results validate YOLO-Drive as a practical and reliable solution for advanced driver monitoring systems,addressing long-standing challenges of subtle cue recognition and multi-cue distraction detection. 展开更多
关键词 Driver distraction recognition attention mechanism fine-grained feature modeling object detection overlapping behavior detection state space model YOLO extensions
在线阅读 下载PDF
A teacher-student based attention network for fine-grainedimage recognition
4
作者 Ang Li Xueyi Zhang +1 位作者 Peilin Li Bin Kang 《Digital Communications and Networks》 2025年第1期52-59,共8页
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin... Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework. 展开更多
关键词 fine-grained image recognition Collaborative teacher-student strategy Multi-attention Global attention
在线阅读 下载PDF
Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset,Methodology and Evaluation 被引量:1
5
作者 Shiwen Song Rui Zhang +1 位作者 Min Hu Feiyao Huang 《Computers, Materials & Continua》 SCIE EI 2024年第6期5243-5271,共29页
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi... Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios. 展开更多
关键词 Multi-modality dataset ship recognition fine-grained recognition attention mechanism
在线阅读 下载PDF
Method for Behavior Recognition of Hu Sheep in Intensive Farming Based on HLNC-YOLO
6
作者 JI Ronghua CHANG Hongrui +2 位作者 ZHANG Suoxiang LIU Zhongying WU Zhonghong 《农业机械学报》 北大核心 2026年第2期265-275,共11页
Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep perf... Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep performing the same behavior,leading to missing and false detection issues in existing behavior recognition methods.A high-low frequency aggregated attention and negative sample comprehensive score loss and comprehensive score soft non-maximum suppression-YOLO(HLNC-YOLO)was proposed for identifying the behavior of Hu sheep,addressing the issues of missed and erroneous detections caused by occlusion between Hu sheep in intensive farming.Firstly,images of four typical behaviors-standing,lying,eating,and drinking-were collected from the sheep farm to construct the Hu sheep behavior dataset(HSBD).Next,to solve the occlusion issues,during the training phase,the C2F-HLAtt module was integrated,which combined high-low frequency aggregation attention,into the YOLO v8 Backbone to perceive occluded objects and introduce an auxiliary reversible branch to retain more effective features.Using comprehensive score regression loss(CSLoss)to reduce the scores of suboptimal boxes and enhance the comprehensive scores of occluded object boxes.Finally,the soft comprehensive score non-maximal suppression(Soft-CS-NMS)algorithm filtered prediction boxes during the inferencing.Testing on the HSBD,HLNC-YOLO achieved a mean average precision(mAP@50)of 87.8%,with a memory footprint of 17.4 MB.This represented an improvement of 7.1,2.2,4.6,and 11 percentage points over YOLO v8,YOLO v9,YOLO v10,and Faster R-CNN,respectively.Research indicated that the HLNC-YOLO accurately identified the behavior of Hu sheep in intensive farming and possessed generalization capabilities,providing technical support for smart farming. 展开更多
关键词 behavior recognition YOLO loss function attention mechanism
在线阅读 下载PDF
Review of the classification and related terminology of fine-grained sedimentary rocks
7
作者 ZHU Rukai SUN Longde +2 位作者 ZOU Caineng CHEN Yang MIAO Xue 《Petroleum Exploration and Development》 2026年第1期61-78,共18页
Through tracing the background and customary usage of classification of fine-grained sedimentary rocks and terminology,and comparing current“sedimentary petrology”textbooks and monographs,this paper proposes a class... Through tracing the background and customary usage of classification of fine-grained sedimentary rocks and terminology,and comparing current“sedimentary petrology”textbooks and monographs,this paper proposes a classification scheme for fine-grained sedimentary rocks and clarifies related terminology.The comprehensive analysis indicates that the classification of clastic rocks,volcanic clastic rocks,chemical rocks,and biogenic(carbonate)rocks is unified,and the definitions of terms such as lamination,bedding and beds are consistent.However,there is a disagreement on the definition of“mud”.European and American scholars commonly use the term“mud”to include silt and clay(particle size less than 0.0625 mm).Chinese scholars equate the term“mud”to“clay”(particle size less than 0.0039 mm or less than 0.01 mm).Combined with the discussion on terms such as sedimentary structures(bedding,lamination and lamellation),shale,mudstone,mudrocks/argillaceous rocks and mud shale,it is recommended to use“fine-grained sedimentary rocks”as the general term for all sedimentary rocks composed of fine-grained materials with particle size less than 0.0625 mm,including claystone/mudrocks and siltstone.Claystone/mudrocks are further classified into argillaceous(or clayey)mudstone/shale,calcareous mudstone/shale,siliceous mudstone/shale,silty mudstone/shale and silt-containing mudstone/shale.Argillaceous(or clayey)mudstone/shale emphasizes a content of clay minerals or clay-sized particles exceeding 50%.Other mudstones/shales emphasize a content of particles(particle size less than 0.0625 mm)exceeding 50%.The commonly referred term“shale”should not include siltstone.It is necessary to establish a reasonable,standardized,and applicable classification scheme for fine-grained sedimentary rocks in the future.An integrated shale microfacies research at the thin-section scale should be carried out,and combined with well logging data interpretation and seismic attribute analysis,a geological model of lithology/lithofacies will be iteratively upgraded to accurately determine sweet layer,locate target layer,and evaluate favorable area. 展开更多
关键词 fine-grained sedimentary rock SHALE MUDSTONE clay shale oil shale gas lamellation shale microfacies classification scheme fine-grained sedimentology
在线阅读 下载PDF
Microstructural evolution and tensile deformation behaviors of fine-grained Fe_(40)Mn_(20)Co_(20)Cr_(15)Si_(5)high entropy alloy prepared by friction stir processing
8
作者 Jia LIN Yuan FANG +7 位作者 Wen WANG Peng HAN Ting ZHANG Qiang LIU Ya-ting XIANG Feng-ming QIANG Ke QIAO Kuai-she WANG 《Transactions of Nonferrous Metals Society of China》 2026年第3期842-854,共13页
A fine-grained metastable dual-phase Fe_(40)Mn_(20)Co_(20)Cr_(15)Si_(5)high entropy alloy(CS-HEA)with excellent strength and ductility was successfully prepared by friction stir processing(FSP).The microstructural and... A fine-grained metastable dual-phase Fe_(40)Mn_(20)Co_(20)Cr_(15)Si_(5)high entropy alloy(CS-HEA)with excellent strength and ductility was successfully prepared by friction stir processing(FSP).The microstructural and mechanical properties of the fine-grained CS-HEA were characterized.The results showed that as-cast shrinkage cavities and elemental segregation were eliminated.The average grain size was refined from 121.1 to 5.4μm.The face-centered cubic phase fraction increased from 23%to 82%.During tensile deformation,dislocation slip dominated at strains ranging from 5%to 17%,followed by transformation induced plasticity(TRIP)from 17%to 26%,and twin induced plasticity(TWIP)from 26%to 37%.The yield strength,ultimate tensile strength,and elongation of the fine-grained CS-HEA were 503 MPa,1120 MPa,and 37%,respectively.The strength-ductility synergy of fine-grained CS-HEA was attributed to the combined effects of TRIP,TWIP,dislocation strengthening,and fine-grained strengthening. 展开更多
关键词 friction stir processing metastable high entropy alloy fine-grained microstructure deformation behaviors transformation-induced plasticity
在线阅读 下载PDF
RSG-Conformer:ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition
9
作者 Yewei Xiao Xin Du Wei Zeng 《Computers, Materials & Continua》 2026年第3期1325-1348,共24页
Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.... Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks. 展开更多
关键词 Audio-visual speech recognition CONFORMER CTC sparse attention
在线阅读 下载PDF
An Intelligent Orchard Anti-Damage System Combining Real-Time AI Image Recognition and Laser-Based Deterrence for Multi-Target Monkeys
10
作者 Shih-Ming Cho Sung-Wen Wang +1 位作者 Min-Chie Chiu Shao-Chun Chen 《Computers, Materials & Continua》 2026年第5期881-918,共38页
To address crop depredation by intelligent species(e.t,macaques)and the habituation from traditional methods,this study proposes an intelligent,closed-loop,adaptive laser deterrence system.A core contribution is an ef... To address crop depredation by intelligent species(e.t,macaques)and the habituation from traditional methods,this study proposes an intelligent,closed-loop,adaptive laser deterrence system.A core contribution is an efficient multi-stage Semi-Supervised Learning(SSL)and incremental fine-tuning(IFT)framework,which reduced manual annotation by~60%and training time by~68%.This framework was benchmarked against YOLOv8n,v10n,and v11n.Our analysis revealed that YOLOv12n’s high Signal-to-Noise Ratio(SNR)(47.1%retention)pseudo-labels made it the onlymodel to gain performance(+0.010mAP)fromSSL,allowing it to overtake competitors.Subsequently,in the IFT stress test,YOLOv12n proved most robust(a minimal−0.019 mAP decline),whereas YOLOv10n suffered catastrophic failure(−0.233mAP),highlighting its incompatibility with IFT.Thefinalmodel achieved high performance(mAP@0.5 of 0.947 for macaques,0.946 for laser spots).In Multi-Object Tracking(MOT),this study quantitatively confirms that Bottom-Up Tracking by Sorting(BoT-SORT)(1.88 s avg.tracklet lifetime)significantly outperforms ByteTrack(0.81 s)in identity preservation for visually similar macaques.System integration achieved 480 Frames Per Second(FPS)real-time inference on edge devices.A quadratic polynomial fittingmodel ensured high-precision aiming(RMSE<2 pixels;best 1.2 pixels)by compensating for distortion.To fundamentally solve habituation,an adaptive strategy driven by a Deep Deterministic Policy Gradient(DDPG)framework was introduced.By using a habituation penalty term(Rhabituation)to force unpredictable sequences,theDDPGstrategy achieved a stable 88%average Intrusion Frequency Reduction Rate(IFRR)in field experiments,suppressing habituation in highly intelligent species.This study develops an efficient,precise,low-cost,and habituation-resistant automated wildlife defense system. 展开更多
关键词 Multi-target tracking artificial intelligence recognition laser calibration
在线阅读 下载PDF
Hybrid Quantum Gate Enabled CNN Framework with Optimized Features for Human-Object Detection and Recognition
11
作者 Nouf Abdullah Almujally Tanvir Fatima Naik Bukht +3 位作者 Shuaa S.Alharbi Asaad Algarni Ahmad Jalal Jeongmin Park 《Computers, Materials & Continua》 2026年第4期2254-2271,共18页
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset... Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency. 展开更多
关键词 Pattern recognition image segmentation computer vision object detection
在线阅读 下载PDF
Korean Sign Language Recognition and Sentence Generation through Data Augmentation
12
作者 Soo-Yeon Jeong Ho-Yeon Jeong Sun-Young Ihm 《Computers, Materials & Continua》 2026年第5期2005-2019,共15页
Sign language is a primary mode of communication for individuals with hearing impairments,conveying meaning through hand shapes and hand movements.Contrary to spoken or written languages,sign language relies on the re... Sign language is a primary mode of communication for individuals with hearing impairments,conveying meaning through hand shapes and hand movements.Contrary to spoken or written languages,sign language relies on the recognition and interpretation of hand gestures captured in video data.However,sign language datasets remain relatively limited compared to those of other languages,which hinders the training and performance of deep learning models.Additionally,the distinct word order of sign language,unlike that of spoken language,requires context-aware and natural sentence generation.To address these challenges,this study applies data augmentation techniques to build a Korean Sign Language dataset and train recognition models.Recognized words are then reconstructed into complete sentences.The sign recognition process uses OpenCV and MediaPipe to extract hand landmarks from sign language videos and analyzes hand position,orientation,and motion.The extracted features are converted into time-series data and fed into a Long Short-Term Memory(LSTM)model.The proposed recognition framework achieved an accuracy of up to 81.25%,while the sentence generation achieved an accuracy of up to 95%.The proposed approach is expected to be applicable not only to Korean Sign Language but also to other low-resource sign languages for recognition and translation tasks. 展开更多
关键词 Korean sign language recognition LSTM data augmentation sentence completion
在线阅读 下载PDF
Enhanced Scene Recognition via Multi-Model Transfer Learning with Limited Labeled Data
13
作者 Samia Allaoua Chelloug Ahmed A.Abd El-Latif +1 位作者 Samah Al Shathri Mohamed Hammad 《Computers, Materials & Continua》 2026年第5期1191-1211,共21页
Scene recognition is a critical component of computer vision,powering applications from autonomous vehicles to surveillance systems.However,its development is often constrained by a heavy reliance on large,expensively... Scene recognition is a critical component of computer vision,powering applications from autonomous vehicles to surveillance systems.However,its development is often constrained by a heavy reliance on large,expensively annotated datasets.This research presents a novel,efficient approach that leveragesmulti-model transfer learning from pre-trained deep neural networks—specifically DenseNet201 and Visual Geometry Group(VGG)—to overcome this limitation.Ourmethod significantly reduces dependency on vast labeled data while achieving high accuracy.Evaluated on the Aerial Image Dataset(AID)dataset,the model attained a validation accuracy of 93.6%with a loss of 0.35,demonstrating robust performance with minimal training data.These results underscore the viability of our approach for real-time,data-efficient scene recognition,offering a practical and cost-effective advancement for the field. 展开更多
关键词 Scene recognition transfer learning pre-trained deep models DenseNet201 VGG
在线阅读 下载PDF
A machine learning-based depression recognition model integrating spiritexpression features from traditional Chinese medicine
14
作者 Minghui Yao Rongrong Zhu +4 位作者 Peng Qian Huilin Liu Xirong Sun Limin Gao Fufeng Li 《Digital Chinese Medicine》 2026年第1期68-79,共12页
Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish ... Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish a TCM-informed tool for early depression screening,thereby bridging traditional diagnostic principles with modern computational approaches.Methods The study included patients with depression who visited the Shanghai Pudong New Area Mental Health Center from October 1,2022 to October 1,2023,as well as students and teachers from Shanghai University of Traditional Chinese Medicine during the same period as the healthy control group.Videos of 3–10 s were captured using a Xiaomi Pad 5,and the TCM spirit and expressions were determined by TCM experts(at least 3 out of 5 experts agreed to determine the category of TCM spirit and expressions).Basic information,facial images,and interview information were collected through a portable TCM intelligent analysis and diagnosis device,and facial diagnosis features were extracted using the Open CV computer vision library technology.Statistical analysis methods such as parametric and non-parametric tests were used to analyze the baseline data,TCM spirit and expression features,and facial diagnosis feature parameters of the two groups,to compare the differences in TCM spirit and expression and facial features.Five machine learning algorithms,including extreme gradient boosting(XGBoost),decision tree(DT),Bernoulli naive Bayes(BernoulliNB),support vector machine(SVM),and k-nearest neighbor(KNN)classification,were used to construct a depression recognition model based on the fusion of TCM spirit and expression features.The performance of the model was evaluated using metrics such as accuracy,precision,and the area under the receiver operating characteristic(ROC)curve(AUC).The model results were explained using the Shapley Additive exPlanations(SHAP).Results A total of 93 depression patients and 87 healthy individuals were ultimately included in this study.There was no statistically significant difference in the baseline characteristics between the two groups(P>0.05).The differences in the characteristics of the spirit and expressions in TCM and facial features between the two groups were shown as follows.(i)Quantispirit facial analysis revealed that depression patients exhibited significantly reduced facial spirit and luminance compared with healthy controls(P<0.05),with characteristic features such as sad expressions,facial erythema,and changes in the lip color ranging from erythematous to cyanotic.(ii)Depressed patients exhibited significantly lower values in facial complexion L,lip L,and a values,and gloss index,but higher values in facial complexion a and b,lip b,low gloss index,and matte index(all P<0.05).(iii)The results of multiple models show that the XGBoost-based depression recognition model,integrating the TCM“spirit-expression”diagnostic framework,achieved an accuracy of 98.61%and significantly outperformed four benchmark algorithms—DT,BernoulliNB,SVM,and KNN(P<0.01).(iv)The SHAP visualization results show that in the recognition model constructed by the XGBoost algorithm,the complexion b value,categories of facial spirit,high gloss index,low gloss index,categories of facial expression and texture features have significant contribution to the model.Conclusion This study demonstrates that integrating TCM spirit-expression diagnostic features with machine learning enables the construction of a high-precision depression detection model,offering a novel paradigm for objective depression diagnosis. 展开更多
关键词 Traditional Chinese medicine SPIRIT EXPRESSION Feature fusion DEPRESSION recognition model
在线阅读 下载PDF
Efficient Video Emotion Recognition via Multi-Scale Region-Aware Convolution and Temporal Interaction Sampling
15
作者 Xiaorui Zhang Chunlin Yuan +1 位作者 Wei Sun Ting Wang 《Computers, Materials & Continua》 2026年第2期2036-2054,共19页
Video emotion recognition is widely used due to its alignment with the temporal characteristics of human emotional expression,but existingmodels have significant shortcomings.On the one hand,Transformermultihead self-... Video emotion recognition is widely used due to its alignment with the temporal characteristics of human emotional expression,but existingmodels have significant shortcomings.On the one hand,Transformermultihead self-attention modeling of global temporal dependency has problems of high computational overhead and feature similarity.On the other hand,fixed-size convolution kernels are often used,which have weak perception ability for emotional regions of different scales.Therefore,this paper proposes a video emotion recognition model that combines multi-scale region-aware convolution with temporal interactive sampling.In terms of space,multi-branch large-kernel stripe convolution is used to perceive emotional region features at different scales,and attention weights are generated for each scale feature.In terms of time,multi-layer odd-even down-sampling is performed on the time series,and oddeven sub-sequence interaction is performed to solve the problem of feature similarity,while reducing computational costs due to the linear relationship between sampling and convolution overhead.This paper was tested on CMU-MOSI,CMU-MOSEI,and Hume Reaction.The Acc-2 reached 83.4%,85.2%,and 81.2%,respectively.The experimental results show that the model can significantly improve the accuracy of emotion recognition. 展开更多
关键词 MULTI-SCALE region-aware convolution temporal interaction sampling video emotion recognition
在线阅读 下载PDF
GaitMAFF:Adaptive Multi-Modal Fusion of Skeleton Maps and Silhouettes for Robust Gait Recognition in Complex Scenarios
16
作者 Zhongbin Luo Zhaoyang Guan +2 位作者 Wenxing You Yunteng Wang Yanqiu Bi 《Computers, Materials & Continua》 2026年第5期540-558,共19页
Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini... Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications. 展开更多
关键词 Gait recognition multi-modal fusion adaptive feature fusion skeleton map SILHOUETTE
在线阅读 下载PDF
Improving Person Recognition for Single-Person-in-Photos:Intimacy in Photo Collections
17
作者 Xiaoyi Duan Tianqi Zou +2 位作者 Chenyang Wang Yu Gu Xiuying Li 《Computers, Materials & Continua》 2026年第2期2089-2112,共24页
Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often ... Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often fail when performing single-person-in-photos recognition in photo collections,as they cannot rely on social connections for recognition.In this work,we discard social relationships and instead measure the relationships between photos to solve this problem.We designed a new model that includes a multi-parameter attention network for adaptively fusing visual features and a unified formula for measuring photo intimacy.This model effectively recognizes individuals in single photo within the collection.Due to outdated annotations and missing photos in the existing PIPA(Person in Photo Album)dataset,wemanually re-annotated it and added approximately ten thousand photos of Asian individuals to address the underrepresentation issue.Our results on the re-annotated PIPA dataset are superior to previous studies in most cases,and experiments on the supplemented dataset further demonstrate the effectiveness of our method.We have made the PIPA dataset publicly available on Zenodo,with the DOI:10.5281/zenodo.12508096(accessed on 15 October 2025). 展开更多
关键词 Deep learning computer vision person recognition photo intimacy PIPA dataset
在线阅读 下载PDF
Human Activity Recognition Using Weighted Average Ensemble by Selected Deep Learning Models
18
作者 Waseem Akhtar Mahwish Ilyas +3 位作者 Romana Aziz Ghadah Aldehim Tassawar Iqbal Muhammad Ramzan 《Computer Modeling in Engineering & Sciences》 2026年第2期971-989,共19页
Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in ... Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively. 展开更多
关键词 Artificial intelligence computer vision deep learning recognition human activity classification image processing
在线阅读 下载PDF
A CNN-Transformer Hybrid Model for Real-Time Recognition of Affective Tactile Biosignals
19
作者 Chang Xu Xianbo Yin +1 位作者 Zhiyong Zhou Bomin Liu 《Computers, Materials & Continua》 2026年第4期2343-2356,共14页
This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal fea... This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal features with the Transformer encoder that captures long-range dependencies in time-series data through multi-head attention.Model performance was evaluated on two widely used tactile biosignal datasets,HAART and CoST,which contain diverse affective touch gestures recorded from pressure sensor arrays.TheCNN-Transformer model achieved recognition rates of 93.33%on HAART and 80.89%on CoST,outperforming existing methods on both benchmarks.By incorporating temporal windowing,the model enables instantaneous prediction,improving generalization across gestures of varying duration.These results highlight the effectiveness of deep learning for tactile biosignal processing and demonstrate the potential of theCNN-Transformer approach for future applications in wearable sensors,affective computing,and biomedical monitoring. 展开更多
关键词 Tactile biosignals affective touch recognition wearable sensors signal processing human-machine interaction
在线阅读 下载PDF
Boruta-LSTMAE:Feature-Enhanced Depth Image Denoising for 3D Recognition
20
作者 Fawad Salam Khan Noman Hasany +6 位作者 Muzammil Ahmad Khan Shayan Abbas Sajjad Ahmed Muhammad Zorain Wai Yie Leong Susama Bagchi Sanjoy Kumar Debnath 《Computers, Materials & Continua》 2026年第4期2181-2206,共26页
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce... The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces. 展开更多
关键词 Boruta LSTM autoencoder feature fusion DENOISING 3D object recognition depth images
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部