期刊文献+
共找到3,798篇文章
< 1 2 190 >
每页显示 20 50 100
Method for Behavior Recognition of Hu Sheep in Intensive Farming Based on HLNC-YOLO
1
作者 JI Ronghua CHANG Hongrui +2 位作者 ZHANG Suoxiang LIU Zhongying WU Zhonghong 《农业机械学报》 北大核心 2026年第2期265-275,共11页
Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep perf... Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep performing the same behavior,leading to missing and false detection issues in existing behavior recognition methods.A high-low frequency aggregated attention and negative sample comprehensive score loss and comprehensive score soft non-maximum suppression-YOLO(HLNC-YOLO)was proposed for identifying the behavior of Hu sheep,addressing the issues of missed and erroneous detections caused by occlusion between Hu sheep in intensive farming.Firstly,images of four typical behaviors-standing,lying,eating,and drinking-were collected from the sheep farm to construct the Hu sheep behavior dataset(HSBD).Next,to solve the occlusion issues,during the training phase,the C2F-HLAtt module was integrated,which combined high-low frequency aggregation attention,into the YOLO v8 Backbone to perceive occluded objects and introduce an auxiliary reversible branch to retain more effective features.Using comprehensive score regression loss(CSLoss)to reduce the scores of suboptimal boxes and enhance the comprehensive scores of occluded object boxes.Finally,the soft comprehensive score non-maximal suppression(Soft-CS-NMS)algorithm filtered prediction boxes during the inferencing.Testing on the HSBD,HLNC-YOLO achieved a mean average precision(mAP@50)of 87.8%,with a memory footprint of 17.4 MB.This represented an improvement of 7.1,2.2,4.6,and 11 percentage points over YOLO v8,YOLO v9,YOLO v10,and Faster R-CNN,respectively.Research indicated that the HLNC-YOLO accurately identified the behavior of Hu sheep in intensive farming and possessed generalization capabilities,providing technical support for smart farming. 展开更多
关键词 behavior recognition YOLO loss function attention mechanism
在线阅读 下载PDF
Human Activity Recognition Using Weighted Average Ensemble by Selected Deep Learning Models
2
作者 Waseem Akhtar Mahwish Ilyas +3 位作者 Romana Aziz Ghadah Aldehim Tassawar Iqbal Muhammad Ramzan 《Computer Modeling in Engineering & Sciences》 2026年第2期971-989,共19页
Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in ... Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively. 展开更多
关键词 Artificial intelligence computer vision deep learning recognition human activity classification image processing
在线阅读 下载PDF
Boruta-LSTMAE:Feature-Enhanced Depth Image Denoising for 3D Recognition
3
作者 Fawad Salam Khan Noman Hasany +6 位作者 Muzammil Ahmad Khan Shayan Abbas Sajjad Ahmed Muhammad Zorain Wai Yie Leong Susama Bagchi Sanjoy Kumar Debnath 《Computers, Materials & Continua》 2026年第4期2181-2206,共26页
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce... The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces. 展开更多
关键词 Boruta LSTM autoencoder feature fusion DENOISING 3D object recognition depth images
在线阅读 下载PDF
Shen Weirong:The Identification and Recognition of Reincarnated living Buddhas Must Be Conducted in Strict Accordance with National Laws
4
作者 Wang Xi 《China's Tibet》 2026年第1期19-23,共5页
What are the origins,historical development,and lineages of the reincarnation system of Living Buddhas in Tibetan Buddhism?What kind of academic framework is"Han-Tibetan Buddhist Studies"?In an interview wit... What are the origins,historical development,and lineages of the reincarnation system of Living Buddhas in Tibetan Buddhism?What kind of academic framework is"Han-Tibetan Buddhist Studies"?In an interview with this journal,Professor Shen Weirong ofTsinghua University discusses these issues on the basis of his research. 展开更多
关键词 reincarnated living buddhas identification recognition living buddhas Tibetan Buddhism LINEAGES reincarnation system academic framework historical development
在线阅读 下载PDF
A machine learning-based depression recognition model integrating spiritexpression features from traditional Chinese medicine
5
作者 Minghui Yao Rongrong Zhu +4 位作者 Peng Qian Huilin Liu Xirong Sun Limin Gao Fufeng Li 《Digital Chinese Medicine》 2026年第1期68-79,共12页
Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish ... Objective To develop a depression recognition model by integrating the spirit-expression diagnostic framework of traditional Chinese medicine(TCM)with machine learning algorithms.The proposed model seeks to establish a TCM-informed tool for early depression screening,thereby bridging traditional diagnostic principles with modern computational approaches.Methods The study included patients with depression who visited the Shanghai Pudong New Area Mental Health Center from October 1,2022 to October 1,2023,as well as students and teachers from Shanghai University of Traditional Chinese Medicine during the same period as the healthy control group.Videos of 3–10 s were captured using a Xiaomi Pad 5,and the TCM spirit and expressions were determined by TCM experts(at least 3 out of 5 experts agreed to determine the category of TCM spirit and expressions).Basic information,facial images,and interview information were collected through a portable TCM intelligent analysis and diagnosis device,and facial diagnosis features were extracted using the Open CV computer vision library technology.Statistical analysis methods such as parametric and non-parametric tests were used to analyze the baseline data,TCM spirit and expression features,and facial diagnosis feature parameters of the two groups,to compare the differences in TCM spirit and expression and facial features.Five machine learning algorithms,including extreme gradient boosting(XGBoost),decision tree(DT),Bernoulli naive Bayes(BernoulliNB),support vector machine(SVM),and k-nearest neighbor(KNN)classification,were used to construct a depression recognition model based on the fusion of TCM spirit and expression features.The performance of the model was evaluated using metrics such as accuracy,precision,and the area under the receiver operating characteristic(ROC)curve(AUC).The model results were explained using the Shapley Additive exPlanations(SHAP).Results A total of 93 depression patients and 87 healthy individuals were ultimately included in this study.There was no statistically significant difference in the baseline characteristics between the two groups(P>0.05).The differences in the characteristics of the spirit and expressions in TCM and facial features between the two groups were shown as follows.(i)Quantispirit facial analysis revealed that depression patients exhibited significantly reduced facial spirit and luminance compared with healthy controls(P<0.05),with characteristic features such as sad expressions,facial erythema,and changes in the lip color ranging from erythematous to cyanotic.(ii)Depressed patients exhibited significantly lower values in facial complexion L,lip L,and a values,and gloss index,but higher values in facial complexion a and b,lip b,low gloss index,and matte index(all P<0.05).(iii)The results of multiple models show that the XGBoost-based depression recognition model,integrating the TCM“spirit-expression”diagnostic framework,achieved an accuracy of 98.61%and significantly outperformed four benchmark algorithms—DT,BernoulliNB,SVM,and KNN(P<0.01).(iv)The SHAP visualization results show that in the recognition model constructed by the XGBoost algorithm,the complexion b value,categories of facial spirit,high gloss index,low gloss index,categories of facial expression and texture features have significant contribution to the model.Conclusion This study demonstrates that integrating TCM spirit-expression diagnostic features with machine learning enables the construction of a high-precision depression detection model,offering a novel paradigm for objective depression diagnosis. 展开更多
关键词 Traditional Chinese medicine SPIRIT EXPRESSION Feature fusion DEPRESSION recognition model
在线阅读 下载PDF
Action Recognition via Shallow CNNs on Intelligently Selected Motion Data
6
作者 Jalees Ur Rahman Muhammad Hanif +2 位作者 Usman Haider Saeed Mian Qaisar Sarra Ayouni 《Computers, Materials & Continua》 2026年第3期2223-2243,共21页
Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the clou... Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency. 展开更多
关键词 Action recognition block matching algorithm convolutional neural network deep learning data compression motion fields optimization videos classification
在线阅读 下载PDF
A Fine-Grained RecognitionModel based on Discriminative Region Localization and Efficient Second-Order Feature Encoding
7
作者 Xiaorui Zhang Yingying Wang +3 位作者 Wei Sun Shiyu Zhou Haoming Zhang Pengpai Wang 《Computers, Materials & Continua》 2026年第4期946-965,共20页
Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in comp... Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively. 展开更多
关键词 Fine-grained recognition feature encoding data augmentation second-order feature discriminative regions
在线阅读 下载PDF
Improving Person Recognition for Single-Person-in-Photos:Intimacy in Photo Collections
8
作者 Xiaoyi Duan Tianqi Zou +2 位作者 Chenyang Wang Yu Gu Xiuying Li 《Computers, Materials & Continua》 2026年第2期2089-2112,共24页
Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often ... Person recognition in photo collections is a critical yet challenging task in computer vision.Previous studies have used social relationships within photo collections to address this issue.However,these methods often fail when performing single-person-in-photos recognition in photo collections,as they cannot rely on social connections for recognition.In this work,we discard social relationships and instead measure the relationships between photos to solve this problem.We designed a new model that includes a multi-parameter attention network for adaptively fusing visual features and a unified formula for measuring photo intimacy.This model effectively recognizes individuals in single photo within the collection.Due to outdated annotations and missing photos in the existing PIPA(Person in Photo Album)dataset,wemanually re-annotated it and added approximately ten thousand photos of Asian individuals to address the underrepresentation issue.Our results on the re-annotated PIPA dataset are superior to previous studies in most cases,and experiments on the supplemented dataset further demonstrate the effectiveness of our method.We have made the PIPA dataset publicly available on Zenodo,with the DOI:10.5281/zenodo.12508096(accessed on 15 October 2025). 展开更多
关键词 Deep learning computer vision person recognition photo intimacy PIPA dataset
在线阅读 下载PDF
RNPC-net:Automatic recognition and mapping of weathering degree and groundwater condition of tunnel faces
9
作者 Xiang Wu Fengyan Wang +4 位作者 Jianping Chen Mingchang Wang Lina Cheng Chengyao Zhang Junke Xu 《Journal of Rock Mechanics and Geotechnical Engineering》 2026年第2期1138-1159,共22页
Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC rec... Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC recognition methods often rely on subjective evaluation by field experts,supplemented by field sampling and laboratory testing.These methods are frequently complex and timeconsuming,making it challenging to meet the rapidly evolving demands of underground engineering.Therefore,this study proposes a rock non-geometric parameter classification network(RNPC-net)to rapidly achieve the recognition and mapping ofWD and GC of tunnel faces.The hybrid feature extraction module(HFEM)in RNPC-net can fully extract,fuse,and utilize multi-scale features of images,enhancing the network's classification performance.Moreover,the designed adaptive weighting auxiliary classifier(AC)helps the network learn features more efficiently.Experimental results show that RNPC-net achieved classification accuracies of 0.8756 and 0.8710 for WD and GC,respectively,representing an improvement of approximately 2%e10%compared to other methods.Both quantitative and qualitative experiments confirm the effectiveness and superiority of RNPC-net.Furthermore,for WD and GC mapping,RNPC-net outperformed other methods by achieving the highest mean intersection over union(mIOU)across most tunnel faces.The mapping results closely align with measurements provided by field experts.The application of WD and GC mapping results to the rock mass rating(RMR)system achieved a transition from conventional qualitative to quantitative evaluation.This advancement enables more accurate and reliable rock mass quality evaluations,particularly under critical conditions of RMR. 展开更多
关键词 Tunnel face Weathering degree Groundwater condition RNPC-net Hybrid feature extraction module recognition and mapping
在线阅读 下载PDF
A CNN-Transformer Hybrid Model for Real-Time Recognition of Affective Tactile Biosignals
10
作者 Chang Xu Xianbo Yin +1 位作者 Zhiyong Zhou Bomin Liu 《Computers, Materials & Continua》 2026年第4期2343-2356,共14页
This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal fea... This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal features with the Transformer encoder that captures long-range dependencies in time-series data through multi-head attention.Model performance was evaluated on two widely used tactile biosignal datasets,HAART and CoST,which contain diverse affective touch gestures recorded from pressure sensor arrays.TheCNN-Transformer model achieved recognition rates of 93.33%on HAART and 80.89%on CoST,outperforming existing methods on both benchmarks.By incorporating temporal windowing,the model enables instantaneous prediction,improving generalization across gestures of varying duration.These results highlight the effectiveness of deep learning for tactile biosignal processing and demonstrate the potential of theCNN-Transformer approach for future applications in wearable sensors,affective computing,and biomedical monitoring. 展开更多
关键词 Tactile biosignals affective touch recognition wearable sensors signal processing human-machine interaction
在线阅读 下载PDF
MFCCT:A Robust Spectral-Temporal Fusion Method with DeepConvLSTM for Human Activity Recognition
11
作者 Rashid Jahangir Nazik Alturki +1 位作者 Muhammad Asif Nauman Faiqa Hanif 《Computers, Materials & Continua》 2026年第2期852-871,共20页
Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,... Human activity recognition(HAR)is a method to predict human activities from sensor signals using machine learning(ML)techniques.HAR systems have several applications in various domains,including medicine,surveillance,behavioral monitoring,and posture analysis.Extraction of suitable information from sensor data is an important part of the HAR process to recognize activities accurately.Several research studies on HAR have utilizedMel frequency cepstral coefficients(MFCCs)because of their effectiveness in capturing the periodic pattern of sensor signals.However,existing MFCC-based approaches often fail to capture sufficient temporal variability,which limits their ability to distinguish between complex or imbalanced activity classes robustly.To address this gap,this study proposes a feature fusion strategy that merges time-based and MFCC features(MFCCT)to enhance activity representation.The merged features were fed to a convolutional neural network(CNN)integrated with long shortterm memory(LSTM)—DeepConvLSTM to construct the HAR model.The MFCCT features with DeepConvLSTM achieved better performance as compared to MFCCs and time-based features on PAMAP2,UCI-HAR,and WISDM by obtaining an accuracy of 97%,98%,and 97%,respectively.In addition,DeepConvLSTM outperformed the deep learning(DL)algorithms that have recently been employed in HAR.These results confirm that the proposed hybrid features are not only practical but also generalizable,making them applicable across diverse HAR datasets for accurate activity classification. 展开更多
关键词 DeepConvLSTM human activity recognition(HAR) MFCCT feature fusion wearable sensors
在线阅读 下载PDF
Industrial EdgeSign:NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories
12
作者 Meixi Chu Xinyu Jiang Yushu Tao 《Computers, Materials & Continua》 2026年第2期708-730,共23页
Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrai... Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrained hardware or lose accuracy under occlusion,vibration,and lighting changes.We introduce Industrial EdgeSign,a dual-path framework that combines hardware-aware neural architecture search(NAS)with large multimodalmodel(LMM)guided semantics to deliver robust,low-latency gesture recognition on edge devices.The searched model uses a truncated ResNet50 front end,a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention,and localized Transformer layers tuned for on-device inference.To reduce reliance on gloss annotations and mitigate domain shift,we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives,aligning visual features with a shared text space.OnML2HP and SHREC’17,theNAS-derived architecture attains 94.7% accuracywith 86ms inference latency and about 5.9W power on Jetson Nano.Under occlusion,lighting shifts,andmotion blur,accuracy remains above 82%.For safetycritical commands,the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering.Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training,and we observe gains in translation quality(BLEU-422.33).These results show that Industrial EdgeSign provides accurate,resource-aware,and safety-aligned gesture recognition suitable for deployment in smart factory settings. 展开更多
关键词 Hand gesture recognition spatio-temporal feature extraction transformer industrial Internet edge intelligence
在线阅读 下载PDF
Enantioselective recognition of amino acids in water using emission-tunable chiral fluorescent probes
13
作者 Yi-Xin Zhang Fang-Qi Zhang +5 位作者 Ao-Pei Peng Tao Jiang Ya-Xi Meng Yang Li Shuang-Xi Gu Yuan-Yuan Zhu 《Chinese Chemical Letters》 2026年第1期338-343,共6页
The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensit... The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensity changes,which can compromise accuracy and repeatability.In this study,we report a novel fluorescent probe(R)-Z1 that achieves effective enantioselective recognition of chiral amino acids in water by altering emission wavelengths(>60 nm).This water-soluble probe(R)-Z1 exhibits cyan or yellow-green luminescence upon interaction with amino acid enantiomers,enabling reliable chiral detection of 14 natural amino acids.It also allows for the determination of enantiomeric excess through monitoring changes in luminescent color.Additionally,a logic operation with two inputs and three outputs was constructed based on these optical properties.Notably,amino acid enantiomers were successfully detected via dual-channel analysis at both the food and cellular levels.This study provides a new dynamic luminescence-based tool for the accurate sensing and detection of amino acid enantiomers. 展开更多
关键词 Fluorescent probe Amino acid enantiomers Chiral recognition Aqueous solution Dynamic multicolor emissions
原文传递
CGAN based anti-interference recognition method for weld seam images
14
作者 Zelin Zhang Xiuhao Zhu +2 位作者 Lei Wang Jianhua Cao Xuhui Xia 《Chinese Journal of Mechanical Engineering》 2026年第1期249-262,共14页
Common strong noise interferences like metal splashes,smoke,and arc light during welding can seriously pollute the laser stripe images,causing the tracking model to drift and leading to tracking failure.At present,the... Common strong noise interferences like metal splashes,smoke,and arc light during welding can seriously pollute the laser stripe images,causing the tracking model to drift and leading to tracking failure.At present,there are already many mature methods for identifying and extracting feature points of linear laser stripes.When the laser stripe forms a curved shape on the surface of the workpiece,these linear methods will no longer be applicable.To eliminate interference sources,enhance the robustness of the weld tracking model,and effectively extract the feature points of curved laser stripes under strong noise conditions.This paper proposes a Conditional Generative Adversarial Network(CGAN)based anti-interference recognition method for welding images.The generator adopts an improved U-Net++structure,adds a Multi-scale Channel Attention module(MS-CAM),introduces Deep Supervision,and proposes a Multi-output Fusion strategy(MOFS)in the output result to en-hance the image inpainting effect;the discriminator uses PatchGAN.The center of the laser stripe is obtained using the grayscale center of mass method and then combined with polynomial fitting to extract the feature points of the weld seam.The experimental results show that the PSNR of the inpainting image is 26.24 dB,the SSIM is 0.98,and the LPIPS is 0.032.The centerline of the inpainting image and the centerline of the noise-free image laser stripe are fitted with a curve.The error of centerline feature points is no more than 5%,confirming the superiority and feasibility of the method. 展开更多
关键词 Laser vision Generative adversarial network Weld seam image Anti-interference recognition Centerline extraction
在线阅读 下载PDF
RSG-Conformer:ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition
15
作者 Yewei Xiao Xin Du Wei Zeng 《Computers, Materials & Continua》 2026年第3期1325-1348,共24页
Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.... Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks. 展开更多
关键词 Audio-visual speech recognition CONFORMER CTC sparse attention
在线阅读 下载PDF
Automated recognition of rock discontinuity in underground engineering using geometric feature analysis
16
作者 Adili Rusuli Xiaojun Li +1 位作者 Yuyun Wang Yi Rui 《Journal of Rock Mechanics and Geotechnical Engineering》 2026年第2期1016-1033,共18页
Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,su... Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,suffer from accuracy degradation,omission of critical discontinuities when orientation density is unevenly distributed,and need manual intervention.To overcome these limitations,this paper introduces a novel discontinuities identificationmethod based on geometric feature analysis of rock mass.By analyzing spatial distribution variability of point cloud and integrating an adaptive region growing algorithm,the method accurately detects independent discontinuities under complex geological conditions.Given that rock mass orientations typically follow a Fisher distribution,an adaptive hierarchical clustering algorithm based on statistical analysis is employed to automatically determine the optimal number of structural sets,eliminating the need for preset clusters or thresholds inherent in traditional methods.The proposed approach effectively handles diverse rock mass shapes and sizes,leveraging both local and global geometric features to minimize noise interference.Experimental validation on three real-world rock mass models,alongside comparisons with three conventional directional clustering algorithms,demonstrates superior accuracy and robustness in identifying optimal discontinuity sets.The proposed method offers a reliable and efficienttool for discontinuities detection and grouping in underground engineering,significantlyenhancing design and construction outcomes. 展开更多
关键词 Underground engineering Rock mass discontinuity Orientation grouping Fisher distribution 3D point cloud Automated recognition
在线阅读 下载PDF
Hybrid Quantum Gate Enabled CNN Framework with Optimized Features for Human-Object Detection and Recognition
17
作者 Nouf Abdullah Almujally Tanvir Fatima Naik Bukht +3 位作者 Shuaa S.Alharbi Asaad Algarni Ahmad Jalal Jeongmin Park 《Computers, Materials & Continua》 2026年第4期2254-2271,共18页
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset... Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency. 展开更多
关键词 Pattern recognition image segmentation computer vision object detection
在线阅读 下载PDF
Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism
18
作者 Jun Li Chunyan Liang +1 位作者 Zhiguo Liu Fengpei Ge 《Computers, Materials & Continua》 2026年第3期2015-2039,共25页
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM... To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems. 展开更多
关键词 Speech emotion recognition adaptive acoustic mixup enhancement improved coordinate attention shuffle attention attention mechanism deep learning
在线阅读 下载PDF
MDGET-MER:Multi-Level Dynamic Gating and Emotion Transfer for Multi-Modal Emotion Recognition
19
作者 Musheng Chen Qiang Wen +2 位作者 Xiaohong Qiu Junhua Wu Wenqing Fu 《Computers, Materials & Continua》 2026年第3期872-893,共22页
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method... In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets. 展开更多
关键词 Multi-modal emotion recognition dynamic gating emotion transfer module cross-modal dynamic alignment noise robustness
在线阅读 下载PDF
Research on the visualization method of lithology intelligent recognition based on deep learning using mine tunnel images
20
作者 Aiai Wang Shuai Cao +1 位作者 Erol Yilmaz Hui Cao 《International Journal of Minerals,Metallurgy and Materials》 2026年第1期141-152,共12页
An image processing and deep learning method for identifying different types of rock images was proposed.Preprocessing,such as rock image acquisition,gray scaling,Gaussian blurring,and feature dimensionality reduction... An image processing and deep learning method for identifying different types of rock images was proposed.Preprocessing,such as rock image acquisition,gray scaling,Gaussian blurring,and feature dimensionality reduction,was conducted to extract useful feature information and recognize and classify rock images using Tensor Flow-based convolutional neural network(CNN)and Py Qt5.A rock image dataset was established and separated into workouts,confirmation sets,and test sets.The framework was subsequently compiled and trained.The categorization approach was evaluated using image data from the validation and test datasets,and key metrics,such as accuracy,precision,and recall,were analyzed.Finally,the classification model conducted a probabilistic analysis of the measured data to determine the equivalent lithological type for each image.The experimental results indicated that the method combining deep learning,Tensor Flow-based CNN,and Py Qt5 to recognize and classify rock images has an accuracy rate of up to 98.8%,and can be successfully utilized for rock image recognition.The system can be extended to geological exploration,mine engineering,and other rock and mineral resource development to more efficiently and accurately recognize rock samples.Moreover,it can match them with the intelligent support design system to effectively improve the reliability and economy of the support scheme.The system can serve as a reference for supporting the design of other mining and underground space projects. 展开更多
关键词 rock picture recognition convolutional neural network intelligent support for roadways deep learning lithology determination
在线阅读 下载PDF
上一页 1 2 190 下一页 到第
使用帮助 返回顶部