Micro-expressions,fleeting involuntary facial cues lasting under half a second,reveal genuine emotions and are valuable in clinical diagnosis and psychotherapy.Real-time recognition on resource-constrained embedded de...Micro-expressions,fleeting involuntary facial cues lasting under half a second,reveal genuine emotions and are valuable in clinical diagnosis and psychotherapy.Real-time recognition on resource-constrained embedded devices remains challenging,as current methods struggle to balance performance and efficiency.This study introduces a semi-lightweight multifunctional network that enhances real-time deployment and accuracy.Unlike prior simplistic feature fusion techniques,our novel multi-feature fusion strategy leverages temporal,spatial,and differential features to better capture dynamic changes.Enhanced by Residual Network(ResNet)architecture with channel and spatial attention mechanisms,the model improves feature representation while maintaining a lightweight design.Evaluations on SMIC,CASME II,SAMM,and their composite dataset show superior performance in Unweighted F1 Score(UF1)and Unweighted Average Recall(UAR),alongside faster detection speeds compared to existing algorithms.展开更多
Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most ...Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most existing micro-expression recognition technologies so far focus on modeling the single category of micro-expression images and neural network structure.Aiming at the problems of low recognition rate and weak model generalization ability in micro-expression recognition, a micro-expression recognition algorithm is proposed based on graph convolution network(GCN) and Transformer model.Firstly, action unit(AU) feature detection is extracted and facial muscle nodes in the neighborhood are divided into three subsets for recognition.Then, graph convolution layer is used to find the layout of dependencies between AU nodes of micro-expression classification.Finally, multiple attentional features of each facial action are enriched with Transformer model to include more sequence information before calculating the overall correlation of each region.The proposed method is validated in CASME II and CAS(ME)^2 datasets, and the recognition rate reached 69.85%.展开更多
Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and tempo...Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and temporal feature extraction.Based on traditional convolution neural network(CNN)and long short-term memory(LSTM),a recognition method combining global identification attention network(GIA),block identification attention network(BIA)and bi-directional long short-term memory(Bi-LSTM)is proposed.In the BIA,the ME video frame will be cropped,and the training will be carried out by cropping into 24 identification blocks(IBs),10 IBs and uncropped IBs.To alleviate the overfitting problem in training,we first extract the basic features of the preprocessed sequence through the transfer learning layer,and then extract the global and local spatial features of the output data through the GIA layer and the BIA layer,respectively.In the BIA layer,the input data will be cropped into local feature vectors with attention weights to extract the local features of the ME frames;in the GIA layer,the global features of the ME frames will be extracted.Finally,after fusing the global and local feature vectors,the ME time-series information is extracted by Bi-LSTM.The experimental results show that using IBs can significantly improve the model’s ability to extract subtle facial features,and the model works best when 10 IBs are used.展开更多
The intensity of the micro-expression is weak,although the directional low frequency components in the image are preserved by many algorithms,the extracted micro-expression ft^ature information is not sufficient to ac...The intensity of the micro-expression is weak,although the directional low frequency components in the image are preserved by many algorithms,the extracted micro-expression ft^ature information is not sufficient to accurately represent its sequences.In order to improve the accuracy of micro-expression recognition,first,each frame image is extracted from,its sequences,and the image frame is pre-processed by using gray normalization,size normalization,and two-dimensional principal component analysis(2DPCA);then,the optical flow method is used to extract the motion characteristics of the reduced-dimensional image,the information entropy value of the optical flow characteristic image is calculated by the information entropy principle,and the information entropy value is analyzed to obtain the eigenvalue.Therefore,more micro-expression feature information is extracted,including more important information,which can further improve the accuracy of micro-expression classification and recognition;finally,the feature images are classified by using the support vector machine(SVM).The experimental results show that the micro-expression feature image obtained by the information entropy statistics can effectively improve the accuracy of micro-expression recognition.展开更多
Aiming at the problem of unsatisfactory effects of traditional micro-expression recognition algorithms,an efficient micro-expression recognition algorithm is proposed,which uses convolutional neural networks(CNN)to ex...Aiming at the problem of unsatisfactory effects of traditional micro-expression recognition algorithms,an efficient micro-expression recognition algorithm is proposed,which uses convolutional neural networks(CNN)to extract spatial features of micro-expressions,and long short-term memory network(LSTM)to extract time domain features.CNN and LSTM are combined as the basis of micro-expression recognition.In many CNN structures,the visual geometry group(VGG)using a small convolution kernel is finally selected as the pre-network through comparison.Due to the difficulty of deep learning training and over-fitting,the dropout method and batch normalization method are used to solve the problem in the VGG network.Two data sets CASME and CASME II are used for test comparison,in order to solve the problem of insufficient data sets,randomly determine the starting frame,and a fixedlength frame sequence is used as the standard,and repeatedly read all sample frames of the entire data set to achieve trayersal and data amplification.Finallv.a hieh recognition rate of 67.48% is achieved.展开更多
The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dim...The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.展开更多
Background The use of micro-expression recognition to recognize human emotions is one of the most critical challenges in human-computer interaction applications. In recent years, cross-database micro-expression recogn...Background The use of micro-expression recognition to recognize human emotions is one of the most critical challenges in human-computer interaction applications. In recent years, cross-database micro-expression recognition(CDMER) has emerged as a significant challenge in micro-expression recognition and analysis. Because the training and testing data in CDMER come from different micro-expression databases, CDMER is more challenging than conventional micro-expression recognition. Methods In this paper, an adaptive spatio-temporal attention neural network(ASTANN) using an attention mechanism is presented to address this challenge. To this end, the micro-expression databases SMIC and CASME II are first preprocessed using an optical flow approach,which extracts motion information among video frames that represent discriminative features of micro-expression.After preprocessing, a novel adaptive framework with a spatiotemporal attention module was designed to assign spatial and temporal weights to enhance the most discriminative features. The deep neural network then extracts the cross-domain feature, in which the second-order statistics of the sample features in the source domain are aligned with those in the target domain by minimizing the correlation alignment(CORAL) loss such that the source and target databases share similar distributions. Results To evaluate the performance of ASTANN, experiments were conducted based on the SMIC and CASME II databases under the standard experimental evaluation protocol of CDMER. The experimental results demonstrate that ASTANN outperformed other methods in relevant crossdatabase tasks. Conclusions Extensive experiments were conducted on benchmark tasks, and the results show that ASTANN has superior performance compared with other approaches. This demonstrates the superiority of our method in solving the CDMER problem.展开更多
Micro-expression recognition has attracted growing research interests in the field of compute vision.However,micro-expression usually lasts a few seconds,thus it is difficult to detect.This paper presents a new framew...Micro-expression recognition has attracted growing research interests in the field of compute vision.However,micro-expression usually lasts a few seconds,thus it is difficult to detect.This paper presents a new framework to recognize micro-expression using pyramid histogram of Centralized Gabor Binary Pattern from Three Orthogonal Panels(CGBP-TOP)which is an extension of Local Gabor Binary Pattern from Three Orthogonal Panels feature.CGBP-TOP performs spatial and temporal analysis to capture the local facial characteristics of micro-expression image sequences.In order to keep more local information of the face,CGBP-TOP is extracted based on pyramid subregions of the micro-expression video frame.The combination of CGBP-TOP and spatial pyramid can represent well and truly the facial movements of the micro-expression image sequences.However,the dimension of our pyramid CGBP-TOP tends to be very high,which may lead to high data redundancy problem.In addition,it is clear that people of different genders usually have different ways of micro-expression.Therefore,in this paper,in order to select the relevant features of micro-expression,the gender-specific sparse multi-task learning method with adaptive regularization term is adopted to learn a compact subset of pyramid CGBP-TOP feature for micro-expression classification of different sexes.Finally,extensive experiments on widely used CASME II and SMIC databases demonstrate that our method can efficiently extract micro-expression motion features in the micro-expression video clip.Moreover,our proposed approach achieves comparable results with the state-of-the-art methods.展开更多
Micro-expression is manifested through subtle and brief facial movements that relay the genuine person’s hidden emotion.In a sequence of videos,there is a frame that captures the maximum facial differences,which is c...Micro-expression is manifested through subtle and brief facial movements that relay the genuine person’s hidden emotion.In a sequence of videos,there is a frame that captures the maximum facial differences,which is called the apex frame.Therefore,apex frame spotting is a crucial sub-module in a micro-expression recognition system.However,this spotting task is very challenging due to the characteristics of micro-expression that occurs in a short duration with low-intensity muscle movements.Moreover,most of the existing automated works face difficulties in differentiating micro-expressions from other facial movements.Therefore,this paper presents a deep learning model with an attention mechanism to spot the micro-expression apex frame from optical flow images.The attention mechanism is embedded into the model so that more weights can be allocated to the regions that manifest the facial movements with higher intensity.The method proposed in this paper has been tested and verified on two spontaneous micro-expression databases,namely Spontaneous Micro-facial Movement(SAMM)and Chinese Academy of Sciences Micro-expression(CASME)II databases.The proposed system performance is evaluated by using the Mean Absolute Error(MAE)metric that measures the distance between the predicted apex frame and the ground truth label.The best MAE of 14.90 was obtained when a combination of five convolutional layers,local response normalization,and attention mechanism is used to model the apex frame spotting.Even with limited datasets,the results have proven that the attention mechanism has better emphasized the regions where the facial movements likely to occur and hence,improves the spotting performance.展开更多
Although significant progress has been made in micro-expression recognition,effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial ...Although significant progress has been made in micro-expression recognition,effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial dynamics.Furthermore,existing methods often suffer from limited gen-eralization,as they primarily focus on single-dataset tasks with small sample sizes.To address these two issues,this paper proposes the cross-domain spatial-temporal graph convolutional network(GCN)(CDST-GCN)model,which comprises two primary components:a siamese attention spa-tial-temporal branch(SASTB)and a global-aware dynamic spatial-temporal branch(GDSTB).Specifically,SASTB utilizes a contrastive learning strategy to project macro-and micro-expressions into a shared,aligned feature space,actively addressing cross-domain discrepancies.Additionally,it integrates an attention-gated mechanism that generates adaptive adjacency matrices to flexibly model collaborative patterns among facial landmarks.While largely preserving the structural paradigm of SASTB,GDSTB enhances the feature representation by integrating global context extracted from a pretrained model.Through this dual-branch architecture,CDST-GCN success-fully models both the global and local spatial-temporal features.The experimental results on CASME II and SAMM datasets demonstrate that the proposed model achieves competitive perfor-mance.Especially in more challenging 5-class tasks,the accuracy of the model on CASME II dataset is as high as 80.5%.展开更多
Micro-expressions are spontaneous,rapid and subtle facial movements that can hardly be suppressed or fabricated.Micro-expression recognition(MER)is one of the most challenging topics in affective computing.It aims to ...Micro-expressions are spontaneous,rapid and subtle facial movements that can hardly be suppressed or fabricated.Micro-expression recognition(MER)is one of the most challenging topics in affective computing.It aims to recognize subtle facial movements which are quite difficult for humans to perceive in a fleeting period.Recently,many deep learning-based MER methods have been developed.However,how to effectively capture subtle temporal variations for robust MER still perplexes us.We propose a counterfactual discriminative micro-expression recognition(CoDER)method to effectively learn the slight temporal variations for video-based MER.To explicitly capture the causality from temporal dynamics hidden in the micro-expression(ME)sequence,we propose ME counterfactual reasoning by comparing the effects of the facts w.r.t.original ME sequences and the counterfactuals w.r.t.counterfactually-revised ME sequences,and then perform causality-aware prediction to encourage the model to learn those latent ME temporal cues.Extensive experiments on four widely-used ME databases demonstrate the effectiveness of CoDER,which results in comparable and superior MER performance compared with that of the state-of-the-art methods.The visualization results show that CoDER successfully perceives the meaningful temporal variations in sequential faces.展开更多
Micro-expression recognition is a substantive cross-study of psychology and computer science,and it has a wide range of applications(e.g.,psychological and clinical diagnosis,emotional analysis,criminal investigation,...Micro-expression recognition is a substantive cross-study of psychology and computer science,and it has a wide range of applications(e.g.,psychological and clinical diagnosis,emotional analysis,criminal investigation,etc.).However,the subtle and diverse changes in facial muscles make it difficult for existing methods to extract effective features,which limits the improvement of micro-expression recognition accuracy.Therefore,we propose a multi-scale joint feature network based on optical flow images for micro-expression recognition.First,we generate an optical flow image that reflects subtle facial motion information.The optical flow image is then fed into the multi-scale joint network for feature extraction and classification.The proposed joint feature module(JFM)integrates features from different layers,which is beneficial for the capture of micro-expression features with different amplitudes.To improve the recognition ability of the model,we also adopt a strategy for fusing the feature prediction results of the three JFMs with the backbone network.Our experimental results show that our method is superior to state-of-the-art methods on three benchmark datasets(SMIC,CASME II,and SAMM)and a combined dataset(3 DB).展开更多
Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person...Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person's real emotions and have a wide range of application in public safety and clinical diagnosis.The analysis of facial micro-expressions in video sequences through computer vision is still relatively recent.In this research,a comprehensive review on the topic of spotting and recognition used in micro expression analysis databases and methods,is conducted,and advanced technologies in this area are summarized.In addition,we discuss challenges that remain unresolved alongside future work to be completed in the field of micro-expression analysis.展开更多
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ...A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.展开更多
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi...In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.展开更多
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited...Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.展开更多
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he...The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been pr...Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.展开更多
Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper pro...Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper proposes a recognition method based on YOLOv8 and DeepLabv3+. To improve the image input quality of the DeepLabv3+ model, the YOLOv8 detector is used to quickly locate the instrument region and crop it as the input image for recognition. To enhance the accuracy and speed of pointer recognition, the backbone network of DeepLabv3+ was replaced with Mo-bileNetv3, and the ECA+ module was designed to replace its SE module, reducing model parameters while improving recognition precision. The decoder’s fourfold-up sampling was replaced with two twofold-up samplings, and shallow feature maps were fused with encoder features of the corresponding size. The CBAM module was introduced to improve the segmentation accuracy of the pointer. Experiments were conducted using a self-made dataset of pointer-style instruments from nuclear power plants. Results showed that this method achieved a recognition accuracy of 94.5% at a precision level of 2.5, with an average error of 1.522% and an average total processing time of 0.56 seconds, demonstrating strong performance.展开更多
文摘Micro-expressions,fleeting involuntary facial cues lasting under half a second,reveal genuine emotions and are valuable in clinical diagnosis and psychotherapy.Real-time recognition on resource-constrained embedded devices remains challenging,as current methods struggle to balance performance and efficiency.This study introduces a semi-lightweight multifunctional network that enhances real-time deployment and accuracy.Unlike prior simplistic feature fusion techniques,our novel multi-feature fusion strategy leverages temporal,spatial,and differential features to better capture dynamic changes.Enhanced by Residual Network(ResNet)architecture with channel and spatial attention mechanisms,the model improves feature representation while maintaining a lightweight design.Evaluations on SMIC,CASME II,SAMM,and their composite dataset show superior performance in Unweighted F1 Score(UF1)and Unweighted Average Recall(UAR),alongside faster detection speeds compared to existing algorithms.
基金Supported by Shaanxi Province Key Research and Development Project (2021GY-280)the National Natural Science Foundation of China (No.61834005,61772417,61802304)。
文摘Micro-expressions are spontaneous, unconscious movements that reveal true emotions.Accurate facial movement information and network training learning methods are crucial for micro-expression recognition.However, most existing micro-expression recognition technologies so far focus on modeling the single category of micro-expression images and neural network structure.Aiming at the problems of low recognition rate and weak model generalization ability in micro-expression recognition, a micro-expression recognition algorithm is proposed based on graph convolution network(GCN) and Transformer model.Firstly, action unit(AU) feature detection is extracted and facial muscle nodes in the neighborhood are divided into three subsets for recognition.Then, graph convolution layer is used to find the layout of dependencies between AU nodes of micro-expression classification.Finally, multiple attentional features of each facial action are enriched with Transformer model to include more sequence information before calculating the overall correlation of each region.The proposed method is validated in CASME II and CAS(ME)^2 datasets, and the recognition rate reached 69.85%.
基金supported by the National Natural Science Foundation of Hunan Province,China(Grant Nos.2021JJ50058,2022JJ50051)the Open Platform Innovation Foundation of Hunan Provincial Education Department(Grant No.20K046)The Scientific Research Fund of Hunan Provincial Education Department,China(Grant Nos.21A0350,21C0439,19A133).
文摘Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and temporal feature extraction.Based on traditional convolution neural network(CNN)and long short-term memory(LSTM),a recognition method combining global identification attention network(GIA),block identification attention network(BIA)and bi-directional long short-term memory(Bi-LSTM)is proposed.In the BIA,the ME video frame will be cropped,and the training will be carried out by cropping into 24 identification blocks(IBs),10 IBs and uncropped IBs.To alleviate the overfitting problem in training,we first extract the basic features of the preprocessed sequence through the transfer learning layer,and then extract the global and local spatial features of the output data through the GIA layer and the BIA layer,respectively.In the BIA layer,the input data will be cropped into local feature vectors with attention weights to extract the local features of the ME frames;in the GIA layer,the global features of the ME frames will be extracted.Finally,after fusing the global and local feature vectors,the ME time-series information is extracted by Bi-LSTM.The experimental results show that using IBs can significantly improve the model’s ability to extract subtle facial features,and the model works best when 10 IBs are used.
基金the National Natural Science Foundation of China(Nos.61772417,61634004,and 61602377)the Key R&D Progrm Projects in Shaanxi Province(No.2017GY-060)the Shaanxi Natural Science Basic Research Project(No.018JM4018)。
文摘The intensity of the micro-expression is weak,although the directional low frequency components in the image are preserved by many algorithms,the extracted micro-expression ft^ature information is not sufficient to accurately represent its sequences.In order to improve the accuracy of micro-expression recognition,first,each frame image is extracted from,its sequences,and the image frame is pre-processed by using gray normalization,size normalization,and two-dimensional principal component analysis(2DPCA);then,the optical flow method is used to extract the motion characteristics of the reduced-dimensional image,the information entropy value of the optical flow characteristic image is calculated by the information entropy principle,and the information entropy value is analyzed to obtain the eigenvalue.Therefore,more micro-expression feature information is extracted,including more important information,which can further improve the accuracy of micro-expression classification and recognition;finally,the feature images are classified by using the support vector machine(SVM).The experimental results show that the micro-expression feature image obtained by the information entropy statistics can effectively improve the accuracy of micro-expression recognition.
基金Shaanxi Province Key Research and Development Project(No.2021 GY-280)Shaanxi Province Natural Science Basic Research Program Project(No.2021JM-459)+1 种基金National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)Shaanxi Province International Science and Technology Cooperation Project(No.2018KW-006)。
文摘Aiming at the problem of unsatisfactory effects of traditional micro-expression recognition algorithms,an efficient micro-expression recognition algorithm is proposed,which uses convolutional neural networks(CNN)to extract spatial features of micro-expressions,and long short-term memory network(LSTM)to extract time domain features.CNN and LSTM are combined as the basis of micro-expression recognition.In many CNN structures,the visual geometry group(VGG)using a small convolution kernel is finally selected as the pre-network through comparison.Due to the difficulty of deep learning training and over-fitting,the dropout method and batch normalization method are used to solve the problem in the VGG network.Two data sets CASME and CASME II are used for test comparison,in order to solve the problem of insufficient data sets,randomly determine the starting frame,and a fixedlength frame sequence is used as the standard,and repeatedly read all sample frames of the entire data set to achieve trayersal and data amplification.Finallv.a hieh recognition rate of 67.48% is achieved.
基金Supported by the Shaanxi Province Key Research and Development Project(No.2021GY-280)Shaanxi Province Natural Science Basic Re-search Program Project(No.2021JM-459)+1 种基金the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)the Shaanxi Province International Science and Technology Cooperation Project(No.2018KW-006).
文摘The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.
文摘Background The use of micro-expression recognition to recognize human emotions is one of the most critical challenges in human-computer interaction applications. In recent years, cross-database micro-expression recognition(CDMER) has emerged as a significant challenge in micro-expression recognition and analysis. Because the training and testing data in CDMER come from different micro-expression databases, CDMER is more challenging than conventional micro-expression recognition. Methods In this paper, an adaptive spatio-temporal attention neural network(ASTANN) using an attention mechanism is presented to address this challenge. To this end, the micro-expression databases SMIC and CASME II are first preprocessed using an optical flow approach,which extracts motion information among video frames that represent discriminative features of micro-expression.After preprocessing, a novel adaptive framework with a spatiotemporal attention module was designed to assign spatial and temporal weights to enhance the most discriminative features. The deep neural network then extracts the cross-domain feature, in which the second-order statistics of the sample features in the source domain are aligned with those in the target domain by minimizing the correlation alignment(CORAL) loss such that the source and target databases share similar distributions. Results To evaluate the performance of ASTANN, experiments were conducted based on the SMIC and CASME II databases under the standard experimental evaluation protocol of CDMER. The experimental results demonstrate that ASTANN outperformed other methods in relevant crossdatabase tasks. Conclusions Extensive experiments were conducted on benchmark tasks, and the results show that ASTANN has superior performance compared with other approaches. This demonstrates the superiority of our method in solving the CDMER problem.
基金This work is funded by the natural science foundation of Jiangsu Province(No.BK20150471)the natural science foundation of the higher education institutions of Jiangsu Province(No.17KJB520007)+2 种基金the Key Research and Development Program of Zhenjiang-Social Development(No.SH2018005)the scientific researching fund of Jiangsu University of Science and Technology(No.1132921402,No.1132931803)the basic science and frontier technology research program of Chongqing Municipal Science and Technology Commission(cstc2016jcyjA0407).
文摘Micro-expression recognition has attracted growing research interests in the field of compute vision.However,micro-expression usually lasts a few seconds,thus it is difficult to detect.This paper presents a new framework to recognize micro-expression using pyramid histogram of Centralized Gabor Binary Pattern from Three Orthogonal Panels(CGBP-TOP)which is an extension of Local Gabor Binary Pattern from Three Orthogonal Panels feature.CGBP-TOP performs spatial and temporal analysis to capture the local facial characteristics of micro-expression image sequences.In order to keep more local information of the face,CGBP-TOP is extracted based on pyramid subregions of the micro-expression video frame.The combination of CGBP-TOP and spatial pyramid can represent well and truly the facial movements of the micro-expression image sequences.However,the dimension of our pyramid CGBP-TOP tends to be very high,which may lead to high data redundancy problem.In addition,it is clear that people of different genders usually have different ways of micro-expression.Therefore,in this paper,in order to select the relevant features of micro-expression,the gender-specific sparse multi-task learning method with adaptive regularization term is adopted to learn a compact subset of pyramid CGBP-TOP feature for micro-expression classification of different sexes.Finally,extensive experiments on widely used CASME II and SMIC databases demonstrate that our method can efficiently extract micro-expression motion features in the micro-expression video clip.Moreover,our proposed approach achieves comparable results with the state-of-the-art methods.
基金Authors would like to acknowledge funding from Universiti Kebangsaan Malaysia(Geran Universiti Penyelidikan:GUP-2019-008 and Dana Padanan Kolaborasi:DPK-2021-012).
文摘Micro-expression is manifested through subtle and brief facial movements that relay the genuine person’s hidden emotion.In a sequence of videos,there is a frame that captures the maximum facial differences,which is called the apex frame.Therefore,apex frame spotting is a crucial sub-module in a micro-expression recognition system.However,this spotting task is very challenging due to the characteristics of micro-expression that occurs in a short duration with low-intensity muscle movements.Moreover,most of the existing automated works face difficulties in differentiating micro-expressions from other facial movements.Therefore,this paper presents a deep learning model with an attention mechanism to spot the micro-expression apex frame from optical flow images.The attention mechanism is embedded into the model so that more weights can be allocated to the regions that manifest the facial movements with higher intensity.The method proposed in this paper has been tested and verified on two spontaneous micro-expression databases,namely Spontaneous Micro-facial Movement(SAMM)and Chinese Academy of Sciences Micro-expression(CASME)II databases.The proposed system performance is evaluated by using the Mean Absolute Error(MAE)metric that measures the distance between the predicted apex frame and the ground truth label.The best MAE of 14.90 was obtained when a combination of five convolutional layers,local response normalization,and attention mechanism is used to model the apex frame spotting.Even with limited datasets,the results have proven that the attention mechanism has better emphasized the regions where the facial movements likely to occur and hence,improves the spotting performance.
基金funded in part by the National Natural Science Foundation of China(Nos.62322111,62271289,62501186)the Natural Science Fund for Outstanding Young Scholars of Shandong Province(No.ZR2022YQ60)+4 种基金the Research Fund for the Taishan Scholar Project of Shandong Province(No.tsqn202306064)the Natural Science Fund for Distinguished Young Scientists of ShandongProvince(No.ZR2024JQ007)Shenzhen Science and Technology Program(No.JCYJ20240813101228036)Jinan“20 Terms of New Universities”Funding Project(No.202333035)the Fundamental Research funds for theCentral Universities(No.3072025CFJ0805).
文摘Although significant progress has been made in micro-expression recognition,effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial dynamics.Furthermore,existing methods often suffer from limited gen-eralization,as they primarily focus on single-dataset tasks with small sample sizes.To address these two issues,this paper proposes the cross-domain spatial-temporal graph convolutional network(GCN)(CDST-GCN)model,which comprises two primary components:a siamese attention spa-tial-temporal branch(SASTB)and a global-aware dynamic spatial-temporal branch(GDSTB).Specifically,SASTB utilizes a contrastive learning strategy to project macro-and micro-expressions into a shared,aligned feature space,actively addressing cross-domain discrepancies.Additionally,it integrates an attention-gated mechanism that generates adaptive adjacency matrices to flexibly model collaborative patterns among facial landmarks.While largely preserving the structural paradigm of SASTB,GDSTB enhances the feature representation by integrating global context extracted from a pretrained model.Through this dual-branch architecture,CDST-GCN success-fully models both the global and local spatial-temporal features.The experimental results on CASME II and SAMM datasets demonstrate that the proposed model achieves competitive perfor-mance.Especially in more challenging 5-class tasks,the accuracy of the model on CASME II dataset is as high as 80.5%.
基金supported by the National Natural Science Foundation of China(No.62102180)the Research Grants Council of Hong Kong(Collaborative Research Fund No.C7055-21GF)the Hong Kong Scholars Program,and the Natural Science Foundation of Jiangsu Province(No.BK20210329).
文摘Micro-expressions are spontaneous,rapid and subtle facial movements that can hardly be suppressed or fabricated.Micro-expression recognition(MER)is one of the most challenging topics in affective computing.It aims to recognize subtle facial movements which are quite difficult for humans to perceive in a fleeting period.Recently,many deep learning-based MER methods have been developed.However,how to effectively capture subtle temporal variations for robust MER still perplexes us.We propose a counterfactual discriminative micro-expression recognition(CoDER)method to effectively learn the slight temporal variations for video-based MER.To explicitly capture the causality from temporal dynamics hidden in the micro-expression(ME)sequence,we propose ME counterfactual reasoning by comparing the effects of the facts w.r.t.original ME sequences and the counterfactuals w.r.t.counterfactually-revised ME sequences,and then perform causality-aware prediction to encourage the model to learn those latent ME temporal cues.Extensive experiments on four widely-used ME databases demonstrate the effectiveness of CoDER,which results in comparable and superior MER performance compared with that of the state-of-the-art methods.The visualization results show that CoDER successfully perceives the meaningful temporal variations in sequential faces.
基金supported by the NSFC–Zhejiang Joint Fund of the Integration of Informatization and Industrialization under Grant No.U1909210the the National Natural Science Foundation of China under Grant No.61772312the Fundamental Research Funds of Shandong University(Grant No.2018JC030)。
文摘Micro-expression recognition is a substantive cross-study of psychology and computer science,and it has a wide range of applications(e.g.,psychological and clinical diagnosis,emotional analysis,criminal investigation,etc.).However,the subtle and diverse changes in facial muscles make it difficult for existing methods to extract effective features,which limits the improvement of micro-expression recognition accuracy.Therefore,we propose a multi-scale joint feature network based on optical flow images for micro-expression recognition.First,we generate an optical flow image that reflects subtle facial motion information.The optical flow image is then fed into the multi-scale joint network for feature extraction and classification.The proposed joint feature module(JFM)integrates features from different layers,which is beneficial for the capture of micro-expression features with different amplitudes.To improve the recognition ability of the model,we also adopt a strategy for fusing the feature prediction results of the three JFMs with the backbone network.Our experimental results show that our method is superior to state-of-the-art methods on three benchmark datasets(SMIC,CASME II,and SAMM)and a combined dataset(3 DB).
文摘Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person's real emotions and have a wide range of application in public safety and clinical diagnosis.The analysis of facial micro-expressions in video sequences through computer vision is still relatively recent.In this research,a comprehensive review on the topic of spotting and recognition used in micro expression analysis databases and methods,is conducted,and advanced technologies in this area are summarized.In addition,we discuss challenges that remain unresolved alongside future work to be completed in the field of micro-expression analysis.
文摘A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.
文摘In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.
基金the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.
基金funded by the ICT Division of theMinistry of Posts,Telecommunications,and Information Technology of Bangladesh under Grant Number 56.00.0000.052.33.005.21-7(Tracking No.22FS15306)support from the University of Rajshahi.
文摘The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金supported by the Key Research and Development Program of Jiangsu Province under Grant BE2022059-3,CTBC Bank through the Industry-Academia Cooperation Project,as well as by the Ministry of Science and Technology of Taiwan through Grants MOST-108-2218-E-002-055,MOST-109-2223-E-009-002-MY3,MOST-109-2218-E-009-025,and MOST431109-2218-E-002-015.
文摘Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.
文摘Pointer instruments are widely used in the nuclear power industry. Addressing the issues of low accuracy and slow detection speed in recognizing pointer meter readings under varying types and distances, this paper proposes a recognition method based on YOLOv8 and DeepLabv3+. To improve the image input quality of the DeepLabv3+ model, the YOLOv8 detector is used to quickly locate the instrument region and crop it as the input image for recognition. To enhance the accuracy and speed of pointer recognition, the backbone network of DeepLabv3+ was replaced with Mo-bileNetv3, and the ECA+ module was designed to replace its SE module, reducing model parameters while improving recognition precision. The decoder’s fourfold-up sampling was replaced with two twofold-up samplings, and shallow feature maps were fused with encoder features of the corresponding size. The CBAM module was introduced to improve the segmentation accuracy of the pointer. Experiments were conducted using a self-made dataset of pointer-style instruments from nuclear power plants. Results showed that this method achieved a recognition accuracy of 94.5% at a precision level of 2.5, with an average error of 1.522% and an average total processing time of 0.56 seconds, demonstrating strong performance.