期刊文献+
共找到83,381篇文章
< 1 2 250 >
每页显示 20 50 100
Multi-Modal Named Entity Recognition with Auxiliary Visual Knowledge and Word-Level Fusion
1
作者 Huansha Wang Ruiyang Huang +1 位作者 Qinrang Liu Xinghao Wang 《Computers, Materials & Continua》 2025年第6期5747-5760,共14页
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ... Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines. 展开更多
关键词 multi-modal named entity recognition large language model multi-modal fusion
在线阅读 下载PDF
Multi-Modality Video Representation for Action Recognition 被引量:4
2
作者 Chao Zhu Yike Wang +3 位作者 Dongbing Pu Miao Qi Hui Sun Lei Tan 《Journal on Big Data》 2020年第3期95-104,共10页
Nowadays,action recognition is widely applied in many fields.However,action is hard to define by single modality information.The difference between image recognition and action recognition is that action recognition n... Nowadays,action recognition is widely applied in many fields.However,action is hard to define by single modality information.The difference between image recognition and action recognition is that action recognition needs more modality information to depict one action,such as the appearance,the motion and the dynamic information.Due to the state of action evolves with the change of time,motion information must be considered when representing an action.Most of current methods define an action by spatial information and motion information.There are two key elements of current action recognition methods:spatial information achieved by sampling sparsely on video frames’sequence and the motion content mostly represented by the optical flow which is calculated on consecutive video frames.However,the relevance between them in current methods is weak.Therefore,to strengthen the associativity,this paper presents a new architecture consisted of three streams to obtain multi-modality information.The advantages of our network are:(a)We propose a new sampling approach to sample evenly on the video sequence for acquiring the appearance information;(b)We utilize ResNet101 for gaining high-level and distinguished features;(c)We advance a three-stream architecture to capture temporal,spatial and dynamic information.Experimental results on UCF101 dataset illustrate that our method outperforms other previous methods. 展开更多
关键词 Action recognition dynamic APPEARANCE SPATIAL MOTION ResNet101 UCF101
在线阅读 下载PDF
Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video 被引量:3
3
作者 GOUTSU Yusuke KOBAYASHI Takaki +4 位作者 OBARA Junya KUSAJIMA Ikuo TAKEICHI Kazunari TAKANO Wataru NAKAMURA Yoshihiko 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2015年第4期657-665,共9页
Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become availa... Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely. 展开更多
关键词 gesture recognition multi-modal integration hidden Markov model random forests
在线阅读 下载PDF
Detection and Recognition of Spray Code Numbers on Can Surfaces Based on OCR
4
作者 Hailong Wang Junchao Shi 《Computers, Materials & Continua》 SCIE EI 2025年第1期1109-1128,共20页
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ... A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition. 展开更多
关键词 Can coding recognition differentiable binarization network scene visual text recognition model pruning and quantification transport model
在线阅读 下载PDF
Comprehensive Review and Analysis on Facial Emotion Recognition:Performance Insights into Deep and Traditional Learning with Current Updates and Challenges
5
作者 Amjad Rehman Muhammad Mujahid +2 位作者 Alex Elyassih Bayan AlGhofaily Saeed Ali Omer Bahaj 《Computers, Materials & Continua》 SCIE EI 2025年第1期41-72,共32页
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi... In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research. 展开更多
关键词 Face emotion recognition deep learning hybrid learning CK+ facial images machine learning technological development
在线阅读 下载PDF
Multi-Stage-Based Siamese Neural Network for Seal Image Recognition
6
作者 Jianfeng Lu Xiangye Huang +3 位作者 Caijin Li Renlin Xin Shanqing Zhang Mahmoud Emam 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期405-423,共19页
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited... Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets. 展开更多
关键词 Seal recognition seal authentication document tampering siamese network spatial transformer network similarity comparison network
在线阅读 下载PDF
EL-DenseNet:Mushroom Recognition Based on Erasing Module Using DenseNet
7
作者 WANG Yaojun ZHAO Weiting +1 位作者 BIE Yuhui JIA Lu 《农业机械学报》 北大核心 2025年第9期628-637,共10页
Target occlusion poses a significant challenge in computer vision,particularly in agricultural applications,where occlusion of crops can obscure key features and impair the model’s recognition performance.To address ... Target occlusion poses a significant challenge in computer vision,particularly in agricultural applications,where occlusion of crops can obscure key features and impair the model’s recognition performance.To address this challenge,a mushroom recognition method was proposed based on an erase module integrated into the EL-DenseNet model.EL-DenseNet,an extension of DenseNet,incorporated an erase attention module designed to enhance sensitivity to visible features.The erase module helped eliminate complex backgrounds and irrelevant information,allowing the mushroom body to be preserved and increasing recognition accuracy in cluttered environments.Considering the difficulty in distinguishing similar mushroom species,label smoothing regularization was employed to mitigate mislabeling errors that commonly arose from human observers.This strategy converted hard labels into soft labels during training,reducing the model’s overreliance on noisy labels and improving its generalization ability.Experimental results showed that the proposed EL-DenseNet,when combined with transfer learning,achieved a recognition accuracy of 96.7%for mushrooms in occluded and complex backgrounds.Compared with the original DenseNet and other classic models,this approach demonstrated superior accuracy and robustness,providing a promising solution for intelligent mushroom recognition. 展开更多
关键词 mushroom recognition erase module label smoothing DenseNet
在线阅读 下载PDF
IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare 被引量:1
8
作者 Subrata Kumer Paul Abu Saleh Musa Miah +3 位作者 Rakhi Rani Paul Md.EkramulHamid Jungpil Shin Md Abdur Rahim 《Computers, Materials & Continua》 2025年第8期2513-2530,共18页
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he... The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs. 展开更多
关键词 Real-time human motion recognition(HMR) ENConvLSTM EfficientNet ConvLSTM skeleton data NTU RGB+D 120 dataset MRHA
在线阅读 下载PDF
Occluded Gait Emotion Recognition Based on Multi-Scale Suppression Graph Convolutional Network
9
作者 Yuxiang Zou Ning He +2 位作者 Jiwu Sun Xunrui Huang Wenhua Wang 《Computers, Materials & Continua》 SCIE EI 2025年第1期1255-1276,共22页
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac... In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods. 展开更多
关键词 KNN interpolation multi-scale temporal convolution suppression graph convolutional network gait emotion recognition human skeleton
在线阅读 下载PDF
IDSSCNN-XgBoost:Improved Dual-Stream Shallow Convolutional Neural Network Based on Extreme Gradient Boosting Algorithm for Micro Expression Recognition
10
作者 Adnan Ahmad Zhao Li +1 位作者 Irfan Tariq Zhengran He 《Computers, Materials & Continua》 SCIE EI 2025年第1期729-749,共21页
Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been pr... Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time. 展开更多
关键词 ME recognition dual stream shallow convolutional neural network euler video magnification TV-L1 XgBoost
在线阅读 下载PDF
A novel coal-rock recognition method in coal mining face based on fusing laser point cloud and images
11
作者 Yang Liu Lei Si +4 位作者 Zhongbin Wang Miao Chen Xin Li Dong Wei Jinheng Gu 《International Journal of Mining Science and Technology》 2025年第7期1057-1071,共15页
Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and imag... Rapid and accurate recognition of coal and rock is an important prerequisite for safe and efficient coal mining.In this paper,a novel coal-rock recognition method is proposed based on fusing laser point cloud and images,named Multi-Modal Frustum PointNet(MMFP).Firstly,MobileNetV3 is used as the backbone network of Mask R-CNN to reduce the network parameters and compress the model volume.The dilated convolutional block attention mechanism(Dilated CBAM)and inception structure are combined with MobileNetV3 to further enhance the detection accuracy.Subsequently,the 2D target candidate box is calculated through the improved Mask R-CNN,and the frustum point cloud in the 2D target candidate box is extracted to reduce the calculation scale and spatial search range.Then,the self-attention PointNet is constructed to segment the fused point cloud within the frustum range,and the bounding box regression network is used to predict the bounding box parameters.Finally,an experimental platform of shearer coal wall cutting is established,and multiple comparative experiments are conducted.Experimental results indicate that the proposed coal-rock recognition method is superior to other advanced models. 展开更多
关键词 Coal miningface Coal-rock recognition Deep learning Laser pointcloud and images fusion multi-modal Frustum PointNet(MMFP)
在线阅读 下载PDF
Adaptive cross-fusion learning for multi-modal gesture recognition 被引量:1
12
作者 Benjia ZHOU Jun WAN +1 位作者 Yanyan LIANG Guodong GUO 《Virtual Reality & Intelligent Hardware》 2021年第3期235-247,共13页
Background Gesture recognition has attracted significant attention because of its wide range of potential applications.Although multi-modal gesture recognition has made significant progress in recent years,a popular m... Background Gesture recognition has attracted significant attention because of its wide range of potential applications.Although multi-modal gesture recognition has made significant progress in recent years,a popular method still is simply fusing prediction scores at the end of each branch,which often ignores complementary features among different modalities in the early stage and does not fuse the complementary features into a more discriminative feature.Methods This paper proposes an Adaptive Cross-modal Weighting(ACmW)scheme to exploit complementarity features from RGB-D data in this study.The scheme learns relations among different modalities by combining the features of different data streams.The proposed ACmW module contains two key functions:(1)fusing complementary features from multiple streams through an adaptive one-dimensional convolution;and(2)modeling the correlation of multi-stream complementary features in the time dimension.Through the effective combination of these two functional modules,the proposed ACmW can automatically analyze the relationship between the complementary features from different streams,and can fuse them in the spatial and temporal dimensions.Results Extensive experiments validate the effectiveness of the proposed method,and show that our method outperforms state-of-the-art methods on IsoGD and NVGesture. 展开更多
关键词 Gesture recognition multi-modal fusion RGB-D
在线阅读 下载PDF
Recognition of Pointer Meter Readings Based on YOLOv8 and DeepLabv3+
13
作者 Jingwei Li Md. Al Amin Zhiyu Shao 《Journal of Computer and Communications》 2025年第1期15-25,共11页
Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper pro... Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper proposes a recognition method based on YOLOv8 and DeepLabv3+. To improve the image input quality of the DeepLabv3+ model, the YOLOv8 detector is used to quickly locate the instrument region and crop it as the input image for recognition. To enhance the accuracy and speed of pointer recognition, the backbone network of DeepLabv3+ was replaced with Mo-bileNetv3, and the ECA+ module was designed to replace its SE module, reducing model parameters while improving recognition precision. The decoder’s fourfold-up sampling was replaced with two twofold-up samplings, and shallow feature maps were fused with encoder features of the corresponding size. The CBAM module was introduced to improve the segmentation accuracy of the pointer. Experiments were conducted using a self-made dataset of pointer-style instruments from nuclear power plants. Results showed that this method achieved a recognition accuracy of 94.5% at a precision level of 2.5, with an average error of 1.522% and an average total processing time of 0.56 seconds, demonstrating strong performance. 展开更多
关键词 Nuclear Power Pointer Instrument YOLOv8 DeepLabv3+ Reading recognition
在线阅读 下载PDF
A deep learning lightweight model for real-time captive macaque facial recognition based on an improved YOLOX model
14
作者 Jia-Jin Zhang Yu Gao +1 位作者 Bao-Lin Zhang Dong-Dong Wu 《Zoological Research》 2025年第2期339-354,共16页
Automated behavior monitoring of macaques offers transformative potential for advancing biomedical research and animal welfare.However,reliably identifying individual macaques in group environments remains a significa... Automated behavior monitoring of macaques offers transformative potential for advancing biomedical research and animal welfare.However,reliably identifying individual macaques in group environments remains a significant challenge.This study introduces ACE-YOLOX,a lightweight facial recognition model tailored for captive macaques.ACE-YOLOX incorporates Efficient Channel Attention(ECA),Complete Intersection over Union loss(CIoU),and Adaptive Spatial Feature Fusion(ASFF)into the YOLOX framework,enhancing prediction accuracy while reducing computational complexity.These integrated approaches enable effective multiscale feature extraction.Using a dataset comprising 179400 labeled facial images from 1196 macaques,ACE-YOLOX surpassed the performance of classical object detection models,demonstrating superior accuracy and real-time processing capabilities.An Android application was also developed to deploy ACE-YOLOX on smartphones,enabling on-device,real-time macaque recognition.Our experimental results highlight the potential of ACE-YOLOX as a non-invasive identification tool,offering an important foundation for future studies in macaque facial expression recognition,cognitive psychology,and social behavior. 展开更多
关键词 YOLOX MACAQUE Facial recognition Identity recognition Animal welfare
在线阅读 下载PDF
Multi-modal face parts fusion based on Gabor feature for face recognition 被引量:1
15
作者 相燕 《High Technology Letters》 EI CAS 2009年第1期70-74,共5页
A novel face recognition method, which is a fusion of muhi-modal face parts based on Gabor feature (MMP-GF), is proposed in this paper. Firstly, the bare face image detached from the normalized image was convolved w... A novel face recognition method, which is a fusion of muhi-modal face parts based on Gabor feature (MMP-GF), is proposed in this paper. Firstly, the bare face image detached from the normalized image was convolved with a family of Gabor kernels, and then according to the face structure and the key-points locations, the calculated Gabor images were divided into five parts: Gabor face, Gabor eyebrow, Gabor eye, Gabor nose and Gabor mouth. After that multi-modal Gabor features were spatially partitioned into non-overlapping regions and the averages of regions were concatenated to be a low dimension feature vector, whose dimension was further reduced by principal component analysis (PCA). In the decision level fusion, match results respectively calculated based on the five parts were combined according to linear discriminant analysis (LDA) and a normalized matching algorithm was used to improve the performance. Experiments on FERET database show that the proposed MMP-GF method achieves good robustness to the expression and age variations. 展开更多
关键词 Gabor filter multi-modal Gabor features principal component analysis (PCA) linear discriminant analysis (IDA) normalized matching algorithm
在线阅读 下载PDF
A Compact Manifold Mixup Feature-Based Open-Set Recognition Approach for Unknown Signals
16
作者 Yang Ying Zhu Lidong +1 位作者 Li Chengjie Sun Hong 《China Communications》 2025年第4期322-338,共17页
There are all kinds of unknown and known signals in the actual electromagnetic environment,which hinders the development of practical cognitive radio applications.However,most existing signal recognition models are di... There are all kinds of unknown and known signals in the actual electromagnetic environment,which hinders the development of practical cognitive radio applications.However,most existing signal recognition models are difficult to discover unknown signals while recognizing known ones.In this paper,a compact manifold mixup feature-based open-set recognition approach(OR-CMMF)is proposed to address the above problem.First,the proposed approach utilizes the center loss to constrain decision boundaries so that it obtains the compact latent signal feature representations and extends the low-confidence feature space.Second,the latent signal feature representations are used to construct synthetic representations as substitutes for unknown categories of signals.Then,these constructed representations can occupy the extended low-confidence space.Finally,the proposed approach applies the distillation loss to adjust the decision boundaries between the known categories signals and the constructed unknown categories substitutes so that it accurately discovers unknown signals.The OR-CMMF approach outperformed other state-of-the-art open-set recognition methods in comprehensive recognition performance and running time,as demonstrated by simulation experiments on two public datasets RML2016.10a and ORACLE. 展开更多
关键词 manifold mixup open-set recognition synthetic representation unknown signal recognition
在线阅读 下载PDF
Dynamic behavior recognition in aerial deployment of multi-segmented foldable-wing drones using variational autoencoders
17
作者 Yilin DOU Zhou ZHOU Rui WANG 《Chinese Journal of Aeronautics》 2025年第6期143-165,共23页
The aerial deployment method enables Unmanned Aerial Vehicles(UAVs)to be directly positioned at the required altitude for their mission.This method typically employs folding technology to improve loading efficiency,wi... The aerial deployment method enables Unmanned Aerial Vehicles(UAVs)to be directly positioned at the required altitude for their mission.This method typically employs folding technology to improve loading efficiency,with applications such as the gravity-only aerial deployment of high-aspect-ratio solar-powered UAVs,and aerial takeoff of fixed-wing drones in Mars research.However,the significant morphological changes during deployment are accompanied by strong nonlinear dynamic aerodynamic forces,which result in multiple degrees of freedom and an unstable character.This hinders the description and analysis of unknown dynamic behaviors,further leading to difficulties in the design of deployment strategies and flight control.To address this issue,this paper proposes an analysis method for dynamic behaviors during aerial deployment based on the Variational Autoencoder(VAE).Focusing on the gravity-only deployment problem of highaspect-ratio foldable-wing UAVs,the method encodes the multi-degree-of-freedom unstable motion signals into a low-dimensional feature space through a data-driven approach.By clustering in the feature space,this paper identifies and studies several dynamic behaviors during aerial deployment.The research presented in this paper offers a new method and perspective for feature extraction and analysis of complex and difficult-to-describe extreme flight dynamics,guiding the research on aerial deployment drones design and control strategies. 展开更多
关键词 Dynamic behavior recognition Aerial deployment technology Variational autoencoder Pattern recognition Multi-rigid-bodydynamics
原文传递
Leveraging Federated Learning for Efficient Privacy-Enhancing Violent Activity Recognition from Videos
18
作者 Moshiur Rahman Tonmoy Md.Mithun Hossain +3 位作者 Mejdl Safran Sultan Alfarhood Dunren Che M.F.Mridha 《Computers, Materials & Continua》 2025年第12期5747-5763,共17页
Automated recognition of violent activities from videos is vital for public safety,but often raises significant privacy concerns due to the sensitive nature of the footage.Moreover,resource constraints often hinder th... Automated recognition of violent activities from videos is vital for public safety,but often raises significant privacy concerns due to the sensitive nature of the footage.Moreover,resource constraints often hinder the deployment of deep learning-based complex video classification models on edge devices.With this motivation,this study aims to investigate an effective violent activity classifier while minimizing computational complexity,attaining competitive performance,and mitigating user data privacy concerns.We present a lightweight deep learning architecture with fewer parameters for efficient violent activity recognition.We utilize a two-stream formation of 3D depthwise separable convolution coupled with a linear self-attention mechanism for effective feature extraction,incorporating federated learning to address data privacy concerns.Experimental findings demonstrate the model’s effectiveness with test accuracies from 96%to above 97%on multiple datasets by incorporating the FedProx aggregation strategy.These findings underscore the potential to develop secure,efficient,and reliable solutions for violent activity recognition in real-world scenarios. 展开更多
关键词 Violent activity recognition human activity recognition federated learning video understanding computer vision
在线阅读 下载PDF
Correction:A Broad Range Triboelectric Stiffness Sensor for Variable Inclusions Recognition
19
作者 Ziyi Zhao Zhentan Quan +8 位作者 Huaze Tang Qinghao Xu Hongfa Zhao Zihan Wang Ziwu Song Shoujie Li Ishara Dharmasena Changsheng Wu Wenbo Ding 《Nano-Micro Letters》 2025年第5期206-206,共1页
Correction to:Nano-Micro Lett.(2023)15:233 https://doi.org/10.1007/s40820-023-01201-7 Following publication of the original article[1],the authors reported that the first two lines of the introduction were accidentall... Correction to:Nano-Micro Lett.(2023)15:233 https://doi.org/10.1007/s40820-023-01201-7 Following publication of the original article[1],the authors reported that the first two lines of the introduction were accidentally placed in the right-hand column of the page in the PDF,which affects the readability. 展开更多
关键词 recognition STIFFNESS placed
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部