期刊文献+
共找到3,881篇文章
< 1 2 195 >
每页显示 20 50 100
Method for Behavior Recognition of Hu Sheep in Intensive Farming Based on HLNC-YOLO
1
作者 JI Ronghua CHANG Hongrui +2 位作者 ZHANG Suoxiang LIU Zhongying WU Zhonghong 《农业机械学报》 北大核心 2026年第2期265-275,共11页
Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep perf... Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep performing the same behavior,leading to missing and false detection issues in existing behavior recognition methods.A high-low frequency aggregated attention and negative sample comprehensive score loss and comprehensive score soft non-maximum suppression-YOLO(HLNC-YOLO)was proposed for identifying the behavior of Hu sheep,addressing the issues of missed and erroneous detections caused by occlusion between Hu sheep in intensive farming.Firstly,images of four typical behaviors-standing,lying,eating,and drinking-were collected from the sheep farm to construct the Hu sheep behavior dataset(HSBD).Next,to solve the occlusion issues,during the training phase,the C2F-HLAtt module was integrated,which combined high-low frequency aggregation attention,into the YOLO v8 Backbone to perceive occluded objects and introduce an auxiliary reversible branch to retain more effective features.Using comprehensive score regression loss(CSLoss)to reduce the scores of suboptimal boxes and enhance the comprehensive scores of occluded object boxes.Finally,the soft comprehensive score non-maximal suppression(Soft-CS-NMS)algorithm filtered prediction boxes during the inferencing.Testing on the HSBD,HLNC-YOLO achieved a mean average precision(mAP@50)of 87.8%,with a memory footprint of 17.4 MB.This represented an improvement of 7.1,2.2,4.6,and 11 percentage points over YOLO v8,YOLO v9,YOLO v10,and Faster R-CNN,respectively.Research indicated that the HLNC-YOLO accurately identified the behavior of Hu sheep in intensive farming and possessed generalization capabilities,providing technical support for smart farming. 展开更多
关键词 behavior recognition YOLO loss function attention mechanism
在线阅读 下载PDF
Human Activity Recognition Using Weighted Average Ensemble by Selected Deep Learning Models
2
作者 Waseem Akhtar Mahwish Ilyas +3 位作者 Romana Aziz Ghadah Aldehim Tassawar Iqbal Muhammad Ramzan 《Computer Modeling in Engineering & Sciences》 2026年第2期971-989,共19页
Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in ... Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively. 展开更多
关键词 Artificial intelligence computer vision deep learning recognition human activity classification image processing
在线阅读 下载PDF
Boruta-LSTMAE:Feature-Enhanced Depth Image Denoising for 3D Recognition
3
作者 Fawad Salam Khan Noman Hasany +6 位作者 Muzammil Ahmad Khan Shayan Abbas Sajjad Ahmed Muhammad Zorain Wai Yie Leong Susama Bagchi Sanjoy Kumar Debnath 《Computers, Materials & Continua》 2026年第4期2181-2206,共26页
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce... The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces. 展开更多
关键词 Boruta LSTM autoencoder feature fusion DENOISING 3D object recognition depth images
在线阅读 下载PDF
A machine learning-based depression recognition model integrating spiritexpression features from traditional Chinese medicine
4
作者 Minghui Yao Rongrong Zhu +4 位作者 Peng Qian Huilin Liu Xirong Sun Limin Gao Fufeng Li 《Digital Chinese Medicine》 2026年第1期68-79,共12页
Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish ... Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish a TCM-informed tool for early depression screening,thereby bridging traditional diagnostic principles with modern computational approaches.Methods The study included patients with depression who visited the Shanghai Pudong New Area Mental Health Center from October 1,2022 to October 1,2023,as well as students and teachers from Shanghai University of Traditional Chinese Medicine during the same period as the healthy control group.Videos of 3–10 s were captured using a Xiaomi Pad 5,and the TCM spirit and expressions were determined by TCM experts(at least 3 out of 5 experts agreed to determine the category of TCM spirit and expressions).Basic information,facial images,and interview information were collected through a portable TCM intelligent analysis and diagnosis device,and facial diagnosis features were extracted using the Open CV computer vision library technology.Statistical analysis methods such as parametric and non-parametric tests were used to analyze the baseline data,TCM spirit and expression features,and facial diagnosis feature parameters of the two groups,to compare the differences in TCM spirit and expression and facial features.Five machine learning algorithms,including extreme gradient boosting(XGBoost),decision tree(DT),Bernoulli naive Bayes(BernoulliNB),support vector machine(SVM),and k-nearest neighbor(KNN)classification,were used to construct a depression recognition model based on the fusion of TCM spirit and expression features.The performance of the model was evaluated using metrics such as accuracy,precision,and the area under the receiver operating characteristic(ROC)curve(AUC).The model results were explained using the Shapley Additive exPlanations(SHAP).Results A total of 93 depression patients and 87 healthy individuals were ultimately included in this study.There was no statistically significant difference in the baseline characteristics between the two groups(P>0.05).The differences in the characteristics of the spirit and expressions in TCM and facial features between the two groups were shown as follows.(i)Quantispirit facial analysis revealed that depression patients exhibited significantly reduced facial spirit and luminance compared with healthy controls(P<0.05),with characteristic features such as sad expressions,facial erythema,and changes in the lip color ranging from erythematous to cyanotic.(ii)Depressed patients exhibited significantly lower values in facial complexion L,lip L,and a values,and gloss index,but higher values in facial complexion a and b,lip b,low gloss index,and matte index(all P<0.05).(iii)The results of multiple models show that the XGBoost-based depression recognition model,integrating the TCM“spirit-expression”diagnostic framework,achieved an accuracy of 98.61%and significantly outperformed four benchmark algorithms—DT,BernoulliNB,SVM,and KNN(P<0.01).(iv)The SHAP visualization results show that in the recognition model constructed by the XGBoost algorithm,the complexion b value,categories of facial spirit,high gloss index,low gloss index,categories of facial expression and texture features have significant contribution to the model.Conclusion This study demonstrates that integrating TCM spirit-expression diagnostic features with machine learning enables the construction of a high-precision depression detection model,offering a novel paradigm for objective depression diagnosis. 展开更多
关键词 Traditional Chinese medicine SPIRIT EXPRESSION Feature fusion DEPRESSION recognition model
在线阅读 下载PDF
Securing Restricted Zones with a Novel Face Recognition Approach Using Face Feature Descriptors and Evidence Theory
5
作者 Rafika Harrabi Slim Ben Chaabane Hassene Seddik 《Computers, Materials & Continua》 2026年第5期1743-1772,共30页
Securing restricted zones such as airports,research facilities,and military bases requires robust and reliable access control mechanisms to prevent unauthorized entry and safeguard critical assets.Face recognition has... Securing restricted zones such as airports,research facilities,and military bases requires robust and reliable access control mechanisms to prevent unauthorized entry and safeguard critical assets.Face recognition has emerged as a key biometric approach for this purpose;however,existing systems are often sensitive to variations in illumination,occlusion,and pose,which degrade their performance in real-world conditions.To address these challenges,this paper proposes a novel hybrid face recognition method that integrates complementary feature descriptors such as Fuzzy-Gabor 2D Fisher Linear Discriminant(FG-2DFLD),Generalized 2D Linear Discriminant Analysis(G2DLDA),andModular-Local Binary Patterns(Modular-LBP)with Dempster–Shafer(DS)evidence theory for decision fusion.The proposed framework extracts global,structural,and local texture features,models them using Gaussian distributions to estimate belief factors,and fuses these belief factors through DS theory to explicitly handle uncertainty and conflict among descriptors.Experimental validation was performed on two widely used benchmark datasets,ORL and Cropped Yale B,achieving recognition rates exceeding 98%,which outperform traditional methods as well as recent deep learning-based approaches.Furthermore,the method demonstrated strong robustness under noisy conditions,maintaining accuracies above 96%with salt-and-pepper and Gaussian noise.These results highlight the effectiveness of the proposed integration strategy in enhancing accuracy,reliability,and resilience compared to single-descriptor and conventional fusion methods.Given its high performance and efficiency,the proposed method shows strong potential for deployment in real-world restricted-zone applications such as smart parking systems,secure facility access,and other high-security domains. 展开更多
关键词 Face recognition feature extraction FG-2DFLD G2DLDA Modular-LBP evidence theory mass function gaussian distribution classification
在线阅读 下载PDF
Enhanced Scene Recognition via Multi-Model Transfer Learning with Limited Labeled Data
6
作者 Samia Allaoua Chelloug Ahmed A.Abd El-Latif +1 位作者 Samah Al Shathri Mohamed Hammad 《Computers, Materials & Continua》 2026年第5期1191-1211,共21页
Scene recognition is a critical component of computer vision,powering applications from autonomous vehicles to surveillance systems.However,its development is often constrained by a heavy reliance on large,expensively... Scene recognition is a critical component of computer vision,powering applications from autonomous vehicles to surveillance systems.However,its development is often constrained by a heavy reliance on large,expensively annotated datasets.This research presents a novel,efficient approach that leveragesmulti-model transfer learning from pre-trained deep neural networks—specifically DenseNet201 and Visual Geometry Group(VGG)—to overcome this limitation.Ourmethod significantly reduces dependency on vast labeled data while achieving high accuracy.Evaluated on the Aerial Image Dataset(AID)dataset,the model attained a validation accuracy of 93.6%with a loss of 0.35,demonstrating robust performance with minimal training data.These results underscore the viability of our approach for real-time,data-efficient scene recognition,offering a practical and cost-effective advancement for the field. 展开更多
关键词 Scene recognition transfer learning pre-trained deep models DenseNet201 VGG
在线阅读 下载PDF
Action Recognition via Shallow CNNs on Intelligently Selected Motion Data
7
作者 Jalees Ur Rahman Muhammad Hanif +2 位作者 Usman Haider Saeed Mian Qaisar Sarra Ayouni 《Computers, Materials & Continua》 2026年第3期2223-2243,共21页
Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the clou... Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency. 展开更多
关键词 Action recognition block matching algorithm convolutional neural network deep learning data compression motion fields optimization videos classification
在线阅读 下载PDF
A Fine-Grained RecognitionModel based on Discriminative Region Localization and Efficient Second-Order Feature Encoding
8
作者 Xiaorui Zhang Yingying Wang +3 位作者 Wei Sun Shiyu Zhou Haoming Zhang Pengpai Wang 《Computers, Materials & Continua》 2026年第4期946-965,共20页
Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in comp... Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively. 展开更多
关键词 Fine-grained recognition feature encoding data augmentation second-order feature discriminative regions
在线阅读 下载PDF
GaitMAFF:Adaptive Multi-Modal Fusion of Skeleton Maps and Silhouettes for Robust Gait Recognition in Complex Scenarios
9
作者 Zhongbin Luo Zhaoyang Guan +2 位作者 Wenxing You Yunteng Wang Yanqiu Bi 《Computers, Materials & Continua》 2026年第5期540-558,共19页
Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini... Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications. 展开更多
关键词 Gait recognition multi-modal fusion adaptive feature fusion skeleton map SILHOUETTE
在线阅读 下载PDF
Improving Person Recognition for Single-Person-in-Photos:Intimacy in Photo Collections
10
作者 Xiaoyi Duan Tianqi Zou +2 位作者 Chenyang Wang Yu Gu Xiuying Li 《Computers, Materials & Continua》 2026年第2期2089-2112,共24页
Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often ... Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often fail when performing single-person-in-photos recognition in photo collections,as they cannot rely on social connections for recognition.In this work,we discard social relationships and instead measure the relationships between photos to solve this problem.We designed a new model that includes a multi-parameter attention network for adaptively fusing visual features and a unified formula for measuring photo intimacy.This model effectively recognizes individuals in single photo within the collection.Due to outdated annotations and missing photos in the existing PIPA(Person in Photo Album)dataset,wemanually re-annotated it and added approximately ten thousand photos of Asian individuals to address the underrepresentation issue.Our results on the re-annotated PIPA dataset are superior to previous studies in most cases,and experiments on the supplemented dataset further demonstrate the effectiveness of our method.We have made the PIPA dataset publicly available on Zenodo,with the DOI:10.5281/zenodo.12508096(accessed on 15 October 2025). 展开更多
关键词 Deep learning computer vision person recognition photo intimacy PIPA dataset
在线阅读 下载PDF
RNPC-net:Automatic recognition and mapping of weathering degree and groundwater condition of tunnel faces
11
作者 Xiang Wu Fengyan Wang +4 位作者 Jianping Chen Mingchang Wang Lina Cheng Chengyao Zhang Junke Xu 《Journal of Rock Mechanics and Geotechnical Engineering》 2026年第2期1138-1159,共22页
Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC rec... Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC recognition methods often rely on subjective evaluation by field experts,supplemented by field sampling and laboratory testing.These methods are frequently complex and timeconsuming,making it challenging to meet the rapidly evolving demands of underground engineering.Therefore,this study proposes a rock non-geometric parameter classification network(RNPC-net)to rapidly achieve the recognition and mapping ofWD and GC of tunnel faces.The hybrid feature extraction module(HFEM)in RNPC-net can fully extract,fuse,and utilize multi-scale features of images,enhancing the network's classification performance.Moreover,the designed adaptive weighting auxiliary classifier(AC)helps the network learn features more efficiently.Experimental results show that RNPC-net achieved classification accuracies of 0.8756 and 0.8710 for WD and GC,respectively,representing an improvement of approximately 2%e10%compared to other methods.Both quantitative and qualitative experiments confirm the effectiveness and superiority of RNPC-net.Furthermore,for WD and GC mapping,RNPC-net outperformed other methods by achieving the highest mean intersection over union(mIOU)across most tunnel faces.The mapping results closely align with measurements provided by field experts.The application of WD and GC mapping results to the rock mass rating(RMR)system achieved a transition from conventional qualitative to quantitative evaluation.This advancement enables more accurate and reliable rock mass quality evaluations,particularly under critical conditions of RMR. 展开更多
关键词 Tunnel face Weathering degree Groundwater condition RNPC-net Hybrid feature extraction module recognition and mapping
在线阅读 下载PDF
A CNN-Transformer Hybrid Model for Real-Time Recognition of Affective Tactile Biosignals
12
作者 Chang Xu Xianbo Yin +1 位作者 Zhiyong Zhou Bomin Liu 《Computers, Materials & Continua》 2026年第4期2343-2356,共14页
This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal fea... This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal features with the Transformer encoder that captures long-range dependencies in time-series data through multi-head attention.Model performance was evaluated on two widely used tactile biosignal datasets,HAART and CoST,which contain diverse affective touch gestures recorded from pressure sensor arrays.TheCNN-Transformer model achieved recognition rates of 93.33%on HAART and 80.89%on CoST,outperforming existing methods on both benchmarks.By incorporating temporal windowing,the model enables instantaneous prediction,improving generalization across gestures of varying duration.These results highlight the effectiveness of deep learning for tactile biosignal processing and demonstrate the potential of theCNN-Transformer approach for future applications in wearable sensors,affective computing,and biomedical monitoring. 展开更多
关键词 Tactile biosignals affective touch recognition wearable sensors signal processing human-machine interaction
在线阅读 下载PDF
MFCCT:A Robust Spectral-Temporal Fusion Method with DeepConvLSTM for Human Activity Recognition
13
作者 Rashid Jahangir Nazik Alturki +1 位作者 Muhammad Asif Nauman Faiqa Hanif 《Computers, Materials & Continua》 2026年第2期852-871,共20页
Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,... Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,behavioral monitoring,and posture analysis.Extraction of suitable information from sensor data is an important part of the HAR process to recognize activities accurately.Several research studies on HAR have utilizedMel frequency cepstral coefficients(MFCCs)because of their effectiveness in capturing the periodic pattern of sensor signals.However,existing MFCC-based approaches often fail to capture sufficient temporal variability,which limits their ability to distinguish between complex or imbalanced activity classes robustly.To address this gap,this study proposes a feature fusion strategy that merges time-based and MFCC features(MFCCT)to enhance activity representation.The merged features were fed to a convolutional neural network(CNN)integrated with long shortterm memory(LSTM)—DeepConvLSTM to construct the HAR model.The MFCCT features with DeepConvLSTM achieved better performance as compared to MFCCs and time-based features on PAMAP2,UCI-HAR,and WISDM by obtaining an accuracy of 97%,98%,and 97%,respectively.In addition,DeepConvLSTM outperformed the deep learning(DL)algorithms that have recently been employed in HAR.These results confirm that the proposed hybrid features are not only practical but also generalizable,making them applicable across diverse HAR datasets for accurate activity classification. 展开更多
关键词 DeepConvLSTM human activity recognition(HAR) MFCCT feature fusion wearable sensors
在线阅读 下载PDF
Industrial EdgeSign:NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories
14
作者 Meixi Chu Xinyu Jiang Yushu Tao 《Computers, Materials & Continua》 2026年第2期708-730,共23页
Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrai... Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrained hardware or lose accuracy under occlusion,vibration,and lighting changes.We introduce Industrial EdgeSign,a dual-path framework that combines hardware-aware neural architecture search(NAS)with large multimodalmodel(LMM)guided semantics to deliver robust,low-latency gesture recognition on edge devices.The searched model uses a truncated ResNet50 front end,a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention,and localized Transformer layers tuned for on-device inference.To reduce reliance on gloss annotations and mitigate domain shift,we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives,aligning visual features with a shared text space.OnML2HP and SHREC’17,theNAS-derived architecture attains 94.7% accuracywith 86ms inference latency and about 5.9W power on Jetson Nano.Under occlusion,lighting shifts,andmotion blur,accuracy remains above 82%.For safetycritical commands,the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering.Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training,and we observe gains in translation quality(BLEU-422.33).These results show that Industrial EdgeSign provides accurate,resource-aware,and safety-aligned gesture recognition suitable for deployment in smart factory settings. 展开更多
关键词 Hand gesture recognition spatio-temporal feature extraction transformer industrial Internet edge intelligence
在线阅读 下载PDF
A Survey on Multimodal Emotion Recognition:Methods,Datasets,and Future Directions
15
作者 A-Seong Moon Haesung Kim +1 位作者 Ye-Chan Park Jaesung Lee 《Computers, Materials & Continua》 2026年第5期1-42,共42页
Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approa... Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms,yet the field remains fragmented due to differences in feature alignment,temporal synchronization,modality reliability,and robustness to noise or missing inputs.This survey provides a comprehensive analysis of MER research from 2021 to 2025,consolidating advances in modality-specific representation learning,cross-modal feature construction,and early,late,and hybrid fusion paradigms.We systematically review visual,acoustic,textual,and sensor-based embeddings,highlighting howpre-trained encoders,self-supervised learning,and large languagemodels have reshaped the representational foundations ofMER.We further categorize fusion strategies by interaction depth and architectural design,examining how attention mechanisms,cross-modal transformers,adaptive gating,and multimodal large language models redefine the integration of affective signals.Finally,we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability,generalization,and interpretability.This survey aims to provide a unified perspective onmultimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence. 展开更多
关键词 Multimodal emotion recognition multimodal learning cross-modal learning fusion strategies representation learning
在线阅读 下载PDF
Enantioselective recognition of amino acids in water using emission-tunable chiral fluorescent probes
16
作者 Yi-Xin Zhang Fang-Qi Zhang +5 位作者 Ao-Pei Peng Tao Jiang Ya-Xi Meng Yang Li Shuang-Xi Gu Yuan-Yuan Zhu 《Chinese Chemical Letters》 2026年第1期338-343,共6页
The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensit... The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensity changes,which can compromise accuracy and repeatability.In this study,we report a novel fluorescent probe(R)-Z1 that achieves effective enantioselective recognition of chiral amino acids in water by altering emission wavelengths(>60 nm).This water-soluble probe(R)-Z1 exhibits cyan or yellow-green luminescence upon interaction with amino acid enantiomers,enabling reliable chiral detection of 14 natural amino acids.It also allows for the determination of enantiomeric excess through monitoring changes in luminescent color.Additionally,a logic operation with two inputs and three outputs was constructed based on these optical properties.Notably,amino acid enantiomers were successfully detected via dual-channel analysis at both the food and cellular levels.This study provides a new dynamic luminescence-based tool for the accurate sensing and detection of amino acid enantiomers. 展开更多
关键词 Fluorescent probe Amino acid enantiomers Chiral recognition Aqueous solution Dynamic multicolor emissions
原文传递
RSG-Conformer:ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition
17
作者 Yewei Xiao Xin Du Wei Zeng 《Computers, Materials & Continua》 2026年第3期1325-1348,共24页
Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.... Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks. 展开更多
关键词 Audio-visual speech recognition CONFORMER CTC sparse attention
在线阅读 下载PDF
Automated recognition of rock discontinuity in underground engineering using geometric feature analysis
18
作者 Adili Rusuli Xiaojun Li +1 位作者 Yuyun Wang Yi Rui 《Journal of Rock Mechanics and Geotechnical Engineering》 2026年第2期1016-1033,共18页
Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,su... Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,suffer from accuracy degradation,omission of critical discontinuities when orientation density is unevenly distributed,and need manual intervention.To overcome these limitations,this paper introduces a novel discontinuities identificationmethod based on geometric feature analysis of rock mass.By analyzing spatial distribution variability of point cloud and integrating an adaptive region growing algorithm,the method accurately detects independent discontinuities under complex geological conditions.Given that rock mass orientations typically follow a Fisher distribution,an adaptive hierarchical clustering algorithm based on statistical analysis is employed to automatically determine the optimal number of structural sets,eliminating the need for preset clusters or thresholds inherent in traditional methods.The proposed approach effectively handles diverse rock mass shapes and sizes,leveraging both local and global geometric features to minimize noise interference.Experimental validation on three real-world rock mass models,alongside comparisons with three conventional directional clustering algorithms,demonstrates superior accuracy and robustness in identifying optimal discontinuity sets.The proposed method offers a reliable and efficienttool for discontinuities detection and grouping in underground engineering,significantlyenhancing design and construction outcomes. 展开更多
关键词 Underground engineering Rock mass discontinuity Orientation grouping Fisher distribution 3D point cloud Automated recognition
在线阅读 下载PDF
Hybrid Quantum Gate Enabled CNN Framework with Optimized Features for Human-Object Detection and Recognition
19
作者 Nouf Abdullah Almujally Tanvir Fatima Naik Bukht +3 位作者 Shuaa S.Alharbi Asaad Algarni Ahmad Jalal Jeongmin Park 《Computers, Materials & Continua》 2026年第4期2254-2271,共18页
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset... Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency. 展开更多
关键词 Pattern recognition image segmentation computer vision object detection
在线阅读 下载PDF
Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism
20
作者 Jun Li Chunyan Liang +1 位作者 Zhiguo Liu Fengpei Ge 《Computers, Materials & Continua》 2026年第3期2015-2039,共25页
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM... To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems. 展开更多
关键词 Speech emotion recognition adaptive acoustic mixup enhancement improved coordinate attention shuffle attention attention mechanism deep learning
在线阅读 下载PDF
上一页 1 2 195 下一页 到第
使用帮助 返回顶部