The interpretation of geological structures on earth observation images involves like many other domains to both visual observation as well as specialized knowledge. To help this process and make it more objective, we...The interpretation of geological structures on earth observation images involves like many other domains to both visual observation as well as specialized knowledge. To help this process and make it more objective, we propose a method to extract the components of complex shapes with a geological significance. Thus, remote sensing allows the production of digital recordings reflecting the objects’ brightness measures on the soil. These recordings are often presented as images and ready to be computer automatically processed. The numerical techniques used exploit the morphology ma- thematical transformations properties. Presentation shows the operations’ sequences with tailored properties. The example shown is a portion of an anticline fraction in which the organization shows clearly oriented entities. The results are obtained by a procedure with an interest in the geological reasoning: it is the extraction of entities involved in the observed structure and the exploration of the main direction of a set of objects striking the structure. Extraction of elementary entities is made by their physical and physiognomic characteristics recognition such as reflectance, the shadow effect, size, shape or orientation. The resulting image must then be stripped frequently of many artifacts. Another sequence has been developed to minimize the noise due to the direct identification of physical measures contained in the image. Data from different spectral bands are first filtered by an operator of grayscale morphology to remove high frequency spatial components. The image then obtained in the treatment that follows is therefore more compact and closer to the needs of the geologist. The search for significant overall direction comes from interception measures sampling a rotation from 0 to 180 degrees. The results obtained show a clear geological significance of the organization of the extracted objects.展开更多
Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep perf...Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep performing the same behavior,leading to missing and false detection issues in existing behavior recognition methods.A high-low frequency aggregated attention and negative sample comprehensive score loss and comprehensive score soft non-maximum suppression-YOLO(HLNC-YOLO)was proposed for identifying the behavior of Hu sheep,addressing the issues of missed and erroneous detections caused by occlusion between Hu sheep in intensive farming.Firstly,images of four typical behaviors-standing,lying,eating,and drinking-were collected from the sheep farm to construct the Hu sheep behavior dataset(HSBD).Next,to solve the occlusion issues,during the training phase,the C2F-HLAtt module was integrated,which combined high-low frequency aggregation attention,into the YOLO v8 Backbone to perceive occluded objects and introduce an auxiliary reversible branch to retain more effective features.Using comprehensive score regression loss(CSLoss)to reduce the scores of suboptimal boxes and enhance the comprehensive scores of occluded object boxes.Finally,the soft comprehensive score non-maximal suppression(Soft-CS-NMS)algorithm filtered prediction boxes during the inferencing.Testing on the HSBD,HLNC-YOLO achieved a mean average precision(mAP@50)of 87.8%,with a memory footprint of 17.4 MB.This represented an improvement of 7.1,2.2,4.6,and 11 percentage points over YOLO v8,YOLO v9,YOLO v10,and Faster R-CNN,respectively.Research indicated that the HLNC-YOLO accurately identified the behavior of Hu sheep in intensive farming and possessed generalization capabilities,providing technical support for smart farming.展开更多
Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in ...Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
What are the origins,historical development,and lineages of the reincarnation system of Living Buddhas in Tibetan Buddhism?What kind of academic framework is"Han-Tibetan Buddhist Studies"?In an interview wit...What are the origins,historical development,and lineages of the reincarnation system of Living Buddhas in Tibetan Buddhism?What kind of academic framework is"Han-Tibetan Buddhist Studies"?In an interview with this journal,Professor Shen Weirong ofTsinghua University discusses these issues on the basis of his research.展开更多
Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the clou...Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency.展开更多
Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in comp...Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively.展开更多
Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC rec...Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC recognition methods often rely on subjective evaluation by field experts,supplemented by field sampling and laboratory testing.These methods are frequently complex and timeconsuming,making it challenging to meet the rapidly evolving demands of underground engineering.Therefore,this study proposes a rock non-geometric parameter classification network(RNPC-net)to rapidly achieve the recognition and mapping ofWD and GC of tunnel faces.The hybrid feature extraction module(HFEM)in RNPC-net can fully extract,fuse,and utilize multi-scale features of images,enhancing the network's classification performance.Moreover,the designed adaptive weighting auxiliary classifier(AC)helps the network learn features more efficiently.Experimental results show that RNPC-net achieved classification accuracies of 0.8756 and 0.8710 for WD and GC,respectively,representing an improvement of approximately 2%e10%compared to other methods.Both quantitative and qualitative experiments confirm the effectiveness and superiority of RNPC-net.Furthermore,for WD and GC mapping,RNPC-net outperformed other methods by achieving the highest mean intersection over union(mIOU)across most tunnel faces.The mapping results closely align with measurements provided by field experts.The application of WD and GC mapping results to the rock mass rating(RMR)system achieved a transition from conventional qualitative to quantitative evaluation.This advancement enables more accurate and reliable rock mass quality evaluations,particularly under critical conditions of RMR.展开更多
This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal fea...This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal features with the Transformer encoder that captures long-range dependencies in time-series data through multi-head attention.Model performance was evaluated on two widely used tactile biosignal datasets,HAART and CoST,which contain diverse affective touch gestures recorded from pressure sensor arrays.TheCNN-Transformer model achieved recognition rates of 93.33%on HAART and 80.89%on CoST,outperforming existing methods on both benchmarks.By incorporating temporal windowing,the model enables instantaneous prediction,improving generalization across gestures of varying duration.These results highlight the effectiveness of deep learning for tactile biosignal processing and demonstrate the potential of theCNN-Transformer approach for future applications in wearable sensors,affective computing,and biomedical monitoring.展开更多
The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensit...The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensity changes,which can compromise accuracy and repeatability.In this study,we report a novel fluorescent probe(R)-Z1 that achieves effective enantioselective recognition of chiral amino acids in water by altering emission wavelengths(>60 nm).This water-soluble probe(R)-Z1 exhibits cyan or yellow-green luminescence upon interaction with amino acid enantiomers,enabling reliable chiral detection of 14 natural amino acids.It also allows for the determination of enantiomeric excess through monitoring changes in luminescent color.Additionally,a logic operation with two inputs and three outputs was constructed based on these optical properties.Notably,amino acid enantiomers were successfully detected via dual-channel analysis at both the food and cellular levels.This study provides a new dynamic luminescence-based tool for the accurate sensing and detection of amino acid enantiomers.展开更多
Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,su...Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,suffer from accuracy degradation,omission of critical discontinuities when orientation density is unevenly distributed,and need manual intervention.To overcome these limitations,this paper introduces a novel discontinuities identificationmethod based on geometric feature analysis of rock mass.By analyzing spatial distribution variability of point cloud and integrating an adaptive region growing algorithm,the method accurately detects independent discontinuities under complex geological conditions.Given that rock mass orientations typically follow a Fisher distribution,an adaptive hierarchical clustering algorithm based on statistical analysis is employed to automatically determine the optimal number of structural sets,eliminating the need for preset clusters or thresholds inherent in traditional methods.The proposed approach effectively handles diverse rock mass shapes and sizes,leveraging both local and global geometric features to minimize noise interference.Experimental validation on three real-world rock mass models,alongside comparisons with three conventional directional clustering algorithms,demonstrates superior accuracy and robustness in identifying optimal discontinuity sets.The proposed method offers a reliable and efficienttool for discontinuities detection and grouping in underground engineering,significantlyenhancing design and construction outcomes.展开更多
Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest....Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.展开更多
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method...In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.展开更多
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset...Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.展开更多
An image processing and deep learning method for identifying different types of rock images was proposed.Preprocessing,such as rock image acquisition,gray scaling,Gaussian blurring,and feature dimensionality reduction...An image processing and deep learning method for identifying different types of rock images was proposed.Preprocessing,such as rock image acquisition,gray scaling,Gaussian blurring,and feature dimensionality reduction,was conducted to extract useful feature information and recognize and classify rock images using Tensor Flow-based convolutional neural network(CNN)and Py Qt5.A rock image dataset was established and separated into workouts,confirmation sets,and test sets.The framework was subsequently compiled and trained.The categorization approach was evaluated using image data from the validation and test datasets,and key metrics,such as accuracy,precision,and recall,were analyzed.Finally,the classification model conducted a probabilistic analysis of the measured data to determine the equivalent lithological type for each image.The experimental results indicated that the method combining deep learning,Tensor Flow-based CNN,and Py Qt5 to recognize and classify rock images has an accuracy rate of up to 98.8%,and can be successfully utilized for rock image recognition.The system can be extended to geological exploration,mine engineering,and other rock and mineral resource development to more efficiently and accurately recognize rock samples.Moreover,it can match them with the intelligent support design system to effectively improve the reliability and economy of the support scheme.The system can serve as a reference for supporting the design of other mining and underground space projects.展开更多
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre...Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.展开更多
Lip language provides a silent,intuitive,and efficient mode of communication,offering a promising solution for individuals with speech impairments.Its articulation relies on complex movements of the jaw and the muscle...Lip language provides a silent,intuitive,and efficient mode of communication,offering a promising solution for individuals with speech impairments.Its articulation relies on complex movements of the jaw and the muscles surrounding it.However,the accurate and real-time acquisition and decoding of these movements into reliable silent speech signals remains a significant challenge.In this work,we propose a real-time silent speech recognition system,which integrates a triboelectric nanogenerator-based flexible pressure sensor(FPS)with a deep learning framework.The FPS employs a porous pyramid-structured silicone film as the negative triboelectric layer,enabling highly sensitive pressure detection in the low-force regime(1 V N^(-1) for 0-10 N and 4.6 V N^(-1) for 10-24 N).This allows it to precisely capture jaw movements during speech and convert them into electrical signals.To decode the signals,we proposed a convolutional neural networklong short-term memory(CNN-LSTM)hybrid network,combining CNN and LSTM model to extract both local spatial features and temporal dynamics.The model achieved 95.83%classification accuracy in 30 categories of daily words.Furthermore,the decoded silent speech signals can be directly translated into executable commands for contactless and precise control of the smartphone.The system can also be connected to AR glasses,offering a novel human-machine interaction approach with promising potential in AR/VR applications.展开更多
Objectives This study aimed to design and evaluate a detection system for the accidental dislodgement of head-and-neck medical supplies through hand position recognition and tracking in Intensive Care Unit(ICU)patient...Objectives This study aimed to design and evaluate a detection system for the accidental dislodgement of head-and-neck medical supplies through hand position recognition and tracking in Intensive Care Unit(ICU)patients.Methods We conducted a single-center,prospective,parallel-group feasibility randomized controlled trial.We recruited 80 participants using convenience sampling from the ICU of a hospital in Ningbo City,Zhejiang Province,between March 2025 and June 2025,and they were randomly assigned to either the control group(routine care)or the intervention group(routine care plus image recognition-based detection system).The system continuously tracked patients’hand positions via bedside cameras and generated real-time alarms when hands entered predefined risk zones,notifying on-duty nurses to enable early intervention.System stability was assessed by continuous system uptime;system performance and clinical feasibility were evaluated by the frequencies of risk actions and accidental dislodgement of medical supplies(ADMS).Results All 80 participants completed the intervention,with 40 patients in each group.The baseline characteristics and median observation time of the two groups were balanced(intervention group:48 h/patient vs.control group:49 h/patient).Compared with the control group,the intervention group showed fewer ADMS(2/40 vs.9/40)and detected more risk actions per 100 h(36 vs.25);all system-detected events had corroborating images with complete concordance on manual review,and all nurse-recorded hand-contact events were accurately captured.Conclusions The study demonstrated that the image recognition-based detection system can function stably in clinical settings,providing accurate and continuous surveillance while supporting the early detection of risk actions.By reducing the observation burden and offering real-time cognitive support,the system complements routine nursing care and serves as an additional safety measure in ICU practice.With further optimization and larger multicenter validation,this approach could have the potential to make a significant contribution to the development of smart ICUs and the broader digital transformation of nursing care.展开更多
文摘The interpretation of geological structures on earth observation images involves like many other domains to both visual observation as well as specialized knowledge. To help this process and make it more objective, we propose a method to extract the components of complex shapes with a geological significance. Thus, remote sensing allows the production of digital recordings reflecting the objects’ brightness measures on the soil. These recordings are often presented as images and ready to be computer automatically processed. The numerical techniques used exploit the morphology ma- thematical transformations properties. Presentation shows the operations’ sequences with tailored properties. The example shown is a portion of an anticline fraction in which the organization shows clearly oriented entities. The results are obtained by a procedure with an interest in the geological reasoning: it is the extraction of entities involved in the observed structure and the exploration of the main direction of a set of objects striking the structure. Extraction of elementary entities is made by their physical and physiognomic characteristics recognition such as reflectance, the shadow effect, size, shape or orientation. The resulting image must then be stripped frequently of many artifacts. Another sequence has been developed to minimize the noise due to the direct identification of physical measures contained in the image. Data from different spectral bands are first filtered by an operator of grayscale morphology to remove high frequency spatial components. The image then obtained in the treatment that follows is therefore more compact and closer to the needs of the geologist. The search for significant overall direction comes from interception measures sampling a rotation from 0 to 180 degrees. The results obtained show a clear geological significance of the organization of the extracted objects.
文摘Behavior recognition of Hu sheep contributes to their intensive and intelligent farming.Due to the generally high density of Hu sheep farming,severe occlusion occurs among different behaviors and even among sheep performing the same behavior,leading to missing and false detection issues in existing behavior recognition methods.A high-low frequency aggregated attention and negative sample comprehensive score loss and comprehensive score soft non-maximum suppression-YOLO(HLNC-YOLO)was proposed for identifying the behavior of Hu sheep,addressing the issues of missed and erroneous detections caused by occlusion between Hu sheep in intensive farming.Firstly,images of four typical behaviors-standing,lying,eating,and drinking-were collected from the sheep farm to construct the Hu sheep behavior dataset(HSBD).Next,to solve the occlusion issues,during the training phase,the C2F-HLAtt module was integrated,which combined high-low frequency aggregation attention,into the YOLO v8 Backbone to perceive occluded objects and introduce an auxiliary reversible branch to retain more effective features.Using comprehensive score regression loss(CSLoss)to reduce the scores of suboptimal boxes and enhance the comprehensive scores of occluded object boxes.Finally,the soft comprehensive score non-maximal suppression(Soft-CS-NMS)algorithm filtered prediction boxes during the inferencing.Testing on the HSBD,HLNC-YOLO achieved a mean average precision(mAP@50)of 87.8%,with a memory footprint of 17.4 MB.This represented an improvement of 7.1,2.2,4.6,and 11 percentage points over YOLO v8,YOLO v9,YOLO v10,and Faster R-CNN,respectively.Research indicated that the HLNC-YOLO accurately identified the behavior of Hu sheep in intensive farming and possessed generalization capabilities,providing technical support for smart farming.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2026R765),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
文摘What are the origins,historical development,and lineages of the reincarnation system of Living Buddhas in Tibetan Buddhism?What kind of academic framework is"Han-Tibetan Buddhist Studies"?In an interview with this journal,Professor Shen Weirong ofTsinghua University discusses these issues on the basis of his research.
基金Supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R896).
文摘Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency.
基金supported,in part,by the National Nature Science Foundation of China under Grant 62272236,62376128 and 62306139the Natural Science Foundation of Jiangsu Province under Grant BK20201136,BK20191401.
文摘Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively.
基金supported by the National Natural Science Foundation of China(Grant Nos.42077242 and 42171407)the Graduate Innovation Fund of Jilin University.
文摘Accurate and rapid recognition of weathering degree(WD)and groundwater condition(GC)is essential for evaluating rock mass quality and conducting stability analyses in underground engineering.Conventional WD and GC recognition methods often rely on subjective evaluation by field experts,supplemented by field sampling and laboratory testing.These methods are frequently complex and timeconsuming,making it challenging to meet the rapidly evolving demands of underground engineering.Therefore,this study proposes a rock non-geometric parameter classification network(RNPC-net)to rapidly achieve the recognition and mapping ofWD and GC of tunnel faces.The hybrid feature extraction module(HFEM)in RNPC-net can fully extract,fuse,and utilize multi-scale features of images,enhancing the network's classification performance.Moreover,the designed adaptive weighting auxiliary classifier(AC)helps the network learn features more efficiently.Experimental results show that RNPC-net achieved classification accuracies of 0.8756 and 0.8710 for WD and GC,respectively,representing an improvement of approximately 2%e10%compared to other methods.Both quantitative and qualitative experiments confirm the effectiveness and superiority of RNPC-net.Furthermore,for WD and GC mapping,RNPC-net outperformed other methods by achieving the highest mean intersection over union(mIOU)across most tunnel faces.The mapping results closely align with measurements provided by field experts.The application of WD and GC mapping results to the rock mass rating(RMR)system achieved a transition from conventional qualitative to quantitative evaluation.This advancement enables more accurate and reliable rock mass quality evaluations,particularly under critical conditions of RMR.
文摘This study presents a hybrid CNN-Transformer model for real-time recognition of affective tactile biosignals.The proposed framework combines convolutional neural networks(CNNs)to extract spatial and local temporal features with the Transformer encoder that captures long-range dependencies in time-series data through multi-head attention.Model performance was evaluated on two widely used tactile biosignal datasets,HAART and CoST,which contain diverse affective touch gestures recorded from pressure sensor arrays.TheCNN-Transformer model achieved recognition rates of 93.33%on HAART and 80.89%on CoST,outperforming existing methods on both benchmarks.By incorporating temporal windowing,the model enables instantaneous prediction,improving generalization across gestures of varying duration.These results highlight the effectiveness of deep learning for tactile biosignal processing and demonstrate the potential of theCNN-Transformer approach for future applications in wearable sensors,affective computing,and biomedical monitoring.
基金the financial support from the National Natural Science Foundation of China(Nos.22377097,22307036,22074114)Natural Science Foundation of Hubei Province of China(Nos.2020CFB623,2021CFB556)Engineering Research Center of Phosphorus Resources Development and Utilization of Ministry of Education(No.LCX202305)。
文摘The detection of amino acid enantiomers holds significant importance in biomedical,chemical,food,and other fields.Traditional chiral recognition methods using fluorescent probes primarily rely on fluorescence intensity changes,which can compromise accuracy and repeatability.In this study,we report a novel fluorescent probe(R)-Z1 that achieves effective enantioselective recognition of chiral amino acids in water by altering emission wavelengths(>60 nm).This water-soluble probe(R)-Z1 exhibits cyan or yellow-green luminescence upon interaction with amino acid enantiomers,enabling reliable chiral detection of 14 natural amino acids.It also allows for the determination of enantiomeric excess through monitoring changes in luminescent color.Additionally,a logic operation with two inputs and three outputs was constructed based on these optical properties.Notably,amino acid enantiomers were successfully detected via dual-channel analysis at both the food and cellular levels.This study provides a new dynamic luminescence-based tool for the accurate sensing and detection of amino acid enantiomers.
基金the National Key Research and Development Program of China(Grant No.2023YFC3009400).
文摘Discontinuities in rock masses critically impact the stability and safety of underground engineering.Mainstream discontinuities identificationmethods,which rely on normal vector estimation and clustering algorithms,suffer from accuracy degradation,omission of critical discontinuities when orientation density is unevenly distributed,and need manual intervention.To overcome these limitations,this paper introduces a novel discontinuities identificationmethod based on geometric feature analysis of rock mass.By analyzing spatial distribution variability of point cloud and integrating an adaptive region growing algorithm,the method accurately detects independent discontinuities under complex geological conditions.Given that rock mass orientations typically follow a Fisher distribution,an adaptive hierarchical clustering algorithm based on statistical analysis is employed to automatically determine the optimal number of structural sets,eliminating the need for preset clusters or thresholds inherent in traditional methods.The proposed approach effectively handles diverse rock mass shapes and sizes,leveraging both local and global geometric features to minimize noise interference.Experimental validation on three real-world rock mass models,alongside comparisons with three conventional directional clustering algorithms,demonstrates superior accuracy and robustness in identifying optimal discontinuity sets.The proposed method offers a reliable and efficienttool for discontinuities detection and grouping in underground engineering,significantlyenhancing design and construction outcomes.
基金supported in part by the National Natural Science Foundation of China:61773330.
文摘Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
基金funded by“the Fanying Special Program of the National Natural Science Foundation of China,grant number 62341307”“the Scientific research project of Jiangxi Provincial Department of Education,grant number GJJ200839”“theDoctoral startup fund of JiangxiUniversity of Technology,grant number 205200100402”.
文摘In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.
基金supported and funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency.
基金financially supported by the National Science and Technology Major Project——Deep Earth Probe and Mineral Resources Exploration(No.2024ZD1003701)the National Key R&D Program of China(No.2022YFC2905004)。
文摘An image processing and deep learning method for identifying different types of rock images was proposed.Preprocessing,such as rock image acquisition,gray scaling,Gaussian blurring,and feature dimensionality reduction,was conducted to extract useful feature information and recognize and classify rock images using Tensor Flow-based convolutional neural network(CNN)and Py Qt5.A rock image dataset was established and separated into workouts,confirmation sets,and test sets.The framework was subsequently compiled and trained.The categorization approach was evaluated using image data from the validation and test datasets,and key metrics,such as accuracy,precision,and recall,were analyzed.Finally,the classification model conducted a probabilistic analysis of the measured data to determine the equivalent lithological type for each image.The experimental results indicated that the method combining deep learning,Tensor Flow-based CNN,and Py Qt5 to recognize and classify rock images has an accuracy rate of up to 98.8%,and can be successfully utilized for rock image recognition.The system can be extended to geological exploration,mine engineering,and other rock and mineral resource development to more efficiently and accurately recognize rock samples.Moreover,it can match them with the intelligent support design system to effectively improve the reliability and economy of the support scheme.The system can serve as a reference for supporting the design of other mining and underground space projects.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments.
基金supported by the Natural Science Foundation of Fujian Province under Grant No.2024J010016Fujian Province Young and Middle aged Teacher Education Research Project No.JAT241317the Mindu Innovation Laboratory Project under Grant No.2020ZZ113.
文摘Lip language provides a silent,intuitive,and efficient mode of communication,offering a promising solution for individuals with speech impairments.Its articulation relies on complex movements of the jaw and the muscles surrounding it.However,the accurate and real-time acquisition and decoding of these movements into reliable silent speech signals remains a significant challenge.In this work,we propose a real-time silent speech recognition system,which integrates a triboelectric nanogenerator-based flexible pressure sensor(FPS)with a deep learning framework.The FPS employs a porous pyramid-structured silicone film as the negative triboelectric layer,enabling highly sensitive pressure detection in the low-force regime(1 V N^(-1) for 0-10 N and 4.6 V N^(-1) for 10-24 N).This allows it to precisely capture jaw movements during speech and convert them into electrical signals.To decode the signals,we proposed a convolutional neural networklong short-term memory(CNN-LSTM)hybrid network,combining CNN and LSTM model to extract both local spatial features and temporal dynamics.The model achieved 95.83%classification accuracy in 30 categories of daily words.Furthermore,the decoded silent speech signals can be directly translated into executable commands for contactless and precise control of the smartphone.The system can also be connected to AR glasses,offering a novel human-machine interaction approach with promising potential in AR/VR applications.
文摘Objectives This study aimed to design and evaluate a detection system for the accidental dislodgement of head-and-neck medical supplies through hand position recognition and tracking in Intensive Care Unit(ICU)patients.Methods We conducted a single-center,prospective,parallel-group feasibility randomized controlled trial.We recruited 80 participants using convenience sampling from the ICU of a hospital in Ningbo City,Zhejiang Province,between March 2025 and June 2025,and they were randomly assigned to either the control group(routine care)or the intervention group(routine care plus image recognition-based detection system).The system continuously tracked patients’hand positions via bedside cameras and generated real-time alarms when hands entered predefined risk zones,notifying on-duty nurses to enable early intervention.System stability was assessed by continuous system uptime;system performance and clinical feasibility were evaluated by the frequencies of risk actions and accidental dislodgement of medical supplies(ADMS).Results All 80 participants completed the intervention,with 40 patients in each group.The baseline characteristics and median observation time of the two groups were balanced(intervention group:48 h/patient vs.control group:49 h/patient).Compared with the control group,the intervention group showed fewer ADMS(2/40 vs.9/40)and detected more risk actions per 100 h(36 vs.25);all system-detected events had corroborating images with complete concordance on manual review,and all nurse-recorded hand-contact events were accurately captured.Conclusions The study demonstrated that the image recognition-based detection system can function stably in clinical settings,providing accurate and continuous surveillance while supporting the early detection of risk actions.By reducing the observation burden and offering real-time cognitive support,the system complements routine nursing care and serves as an additional safety measure in ICU practice.With further optimization and larger multicenter validation,this approach could have the potential to make a significant contribution to the development of smart ICUs and the broader digital transformation of nursing care.