期刊文献+
共找到1,736篇文章
< 1 2 87 >
每页显示 20 50 100
Comprehensive Review and Analysis on Facial Emotion Recognition:Performance Insights into Deep and Traditional Learning with Current Updates and Challenges
1
作者 Amjad Rehman Muhammad Mujahid +2 位作者 Alex Elyassih Bayan AlGhofaily Saeed Ali Omer Bahaj 《Computers, Materials & Continua》 SCIE EI 2025年第1期41-72,共32页
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi... In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research. 展开更多
关键词 Face emotion recognition deep learning hybrid learning CK+ facial images machine learning technological development
在线阅读 下载PDF
Occluded Gait Emotion Recognition Based on Multi-Scale Suppression Graph Convolutional Network
2
作者 Yuxiang Zou Ning He +2 位作者 Jiwu Sun Xunrui Huang Wenhua Wang 《Computers, Materials & Continua》 SCIE EI 2025年第1期1255-1276,共22页
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac... In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods. 展开更多
关键词 KNN interpolation multi-scale temporal convolution suppression graph convolutional network gait emotion recognition human skeleton
在线阅读 下载PDF
Does problematic mobile phone use affect facial emotion recognition?
3
作者 Bowei Go Xianli An 《Journal of Psychology in Africa》 2025年第4期523-533,共11页
This study investigated the impact of problematic mobile phone use(PMPU)on emotion recognition.The PMPU levels of 150 participants were measured using the standardized SAS-SV scale.Based on the SAS-SV cutoff scores,pa... This study investigated the impact of problematic mobile phone use(PMPU)on emotion recognition.The PMPU levels of 150 participants were measured using the standardized SAS-SV scale.Based on the SAS-SV cutoff scores,participants were divided into PMPU and Control groups.These participants completed two emotion recognition experiments involving facial emotion stimuli that had been manipulated to varying emotional intensities using Morph software.Experiment 1(n=75)assessed differences in facial emotion detection accuracy.Experiment 2(n=75),based on signal detection theory,examined differences in hit and false alarm rates across emotional expressions.The results showed that PMPU users demonstrated higher recognition accuracy rates for disgust faces but lower accuracy for happy faces.This indicates a tendency among PMPU users to prioritize specific negative emotions and may have impaired perception of positive emotions.Practically,incorporating diverse emotional stimuli into PMPU intervention may help alleviate the negative emotional focus bias associated with excessive mobile devices use. 展开更多
关键词 problematic mobile phone use emotion recognition facial emotion basic emotion
在线阅读 下载PDF
Dual-Task Contrastive Meta-Learning for Few-Shot Cross-Domain Emotion Recognition
4
作者 Yujiao Tang Yadong Wu +2 位作者 Yuanmei He Jilin Liu Weihan Zhang 《Computers, Materials & Continua》 2025年第2期2331-2352,共22页
Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion... Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion recognition approaches often struggle in few-shot cross-domain scenarios due to their limited capacity to generalize semantic features across different domains. Additionally, these methods face challenges in accurately capturing complex emotional states, particularly those that are subtle or implicit. To overcome these limitations, we introduce a novel approach called Dual-Task Contrastive Meta-Learning (DTCML). This method combines meta-learning and contrastive learning to improve emotion recognition. Meta-learning enhances the model’s ability to generalize to new emotional tasks, while instance contrastive learning further refines the model by distinguishing unique features within each category, enabling it to better differentiate complex emotional expressions. Prototype contrastive learning, in turn, helps the model address the semantic complexity of emotions across different domains, enabling the model to learn fine-grained emotions expression. By leveraging dual tasks, DTCML learns from two domains simultaneously, the model is encouraged to learn more diverse and generalizable emotions features, thereby improving its cross-domain adaptability and robustness, and enhancing its generalization ability. We evaluated the performance of DTCML across four cross-domain settings, and the results show that our method outperforms the best baseline by 5.88%, 12.04%, 8.49%, and 8.40% in terms of accuracy. 展开更多
关键词 Contrastive learning emotion recognition cross-domain learning DUAL-TASK META-LEARNING
在线阅读 下载PDF
Deep Learning⁃Based Speech Emotion Recognition: Leveraging Diverse Datasets and Augmentation Techniques for Robust Modeling
5
作者 Ayush Porwal Praveen Kumar Tyagi +1 位作者 Ajay Sharma Dheeraj Kumar Agarwal 《Journal of Harbin Institute of Technology(New Series)》 2025年第3期54-65,共12页
In recent years,Speech Emotion Recognition(SER)has developed into an essential instrument for interpreting human emotions from auditory data.The proposed research focuses on the development of a SER system employing d... In recent years,Speech Emotion Recognition(SER)has developed into an essential instrument for interpreting human emotions from auditory data.The proposed research focuses on the development of a SER system employing deep learning and multiple datasets containing samples of emotive speech.The primary objective of this research endeavor is to investigate the utilization of Convolutional Neural Networks(CNNs)in the process of sound feature extraction.Stretching,pitch manipulation,and noise injection are a few of the techniques utilized in this study to improve the data quality.Feature extraction methods including Zero Crossing Rate,Chroma_stft,Mel⁃scale Frequency Cepstral Coefficients(MFCC),Root Mean Square(RMS),and Mel⁃Spectogram are used to train a model.By using these techniques,audio signals can be transformed into recognized features that can be utilized to train the model.Ultimately,the study produces a thorough evaluation of the models performance.When this method was applied,the model achieved an impressive accuracy of 94.57%on the test dataset.The proposed work was also validated on the EMO⁃BD and IEMOCAP datasets.These consist of further data augmentation,feature engineering,and hyperparameter optimization.By following these development paths,SER systems will be able to be implemented in real⁃world scenarios with greater accuracy and resilience. 展开更多
关键词 voice signal emotion recognition deep learning CNN
在线阅读 下载PDF
Correction to DeepCNN:Spectro-temporal feature representation for speech emotion recognition
6
《CAAI Transactions on Intelligence Technology》 2025年第2期633-633,共1页
Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf shou... Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf should be[Independent Researcher,UK]. 展开更多
关键词 independent researcher speech emotion recognition deep cnn uk speech emotion recognitioncaai spectro temporal feature representation hafiz tayyab rauf
在线阅读 下载PDF
Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions
7
作者 A-Seong Moon Seungyeon Jeong +3 位作者 Donghee Kim Mohd Asyraf Zulkifley Bong-Soo Sohn Jaesung Lee 《Computers, Materials & Continua》 2025年第11期2851-2872,共22页
Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ... Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025). 展开更多
关键词 Multimodal learning emotion recognition cross-modal attention robust representation learning
在线阅读 下载PDF
EEG Scalogram Analysis in Emotion Recognition:A Swin Transformer and TCN-Based Approach
8
作者 Selime Tuba Pesen Mehmet Ali Altuncu 《Computers, Materials & Continua》 2025年第9期5597-5611,共15页
EEG signals are widely used in emotion recognition due to their ability to reflect involuntary physiological responses.However,the high dimensionality of EEG signals and their continuous variability in the time-freque... EEG signals are widely used in emotion recognition due to their ability to reflect involuntary physiological responses.However,the high dimensionality of EEG signals and their continuous variability in the time-frequency plane make their analysis challenging.Therefore,advanced deep learning methods are needed to extract meaningful features and improve classification performance.This study proposes a hybrid model that integrates the Swin Transformer and Temporal Convolutional Network(TCN)mechanisms for EEG-based emotion recognition.EEG signals are first converted into scalogram images using Continuous Wavelet Transform(CWT),and classification is performed on these images.Swin Transformer is used to extract spatial features in scalogram images,and the TCN method is used to learn long-term dependencies.In addition,attention mechanisms are integrated to highlight the essential features extracted from both models.The effectiveness of the proposed model has been tested on the SEED dataset,widely used in the field of emotion recognition,and it has consistently achieved high performance across all emotional classes,with accuracy,precision,recall,and F1-score values of 97.53%,97.54%,97.53%,and 97.54%,respectively.Compared to traditional transfer learning models,the proposed approach achieved an accuracy increase of 1.43%over ResNet-101,1.81%over DenseNet-201,and 2.44%over VGG-19.In addition,the proposed model outperformed many recent CNN,RNN,and Transformer-based methods reported in the literature. 展开更多
关键词 Continuous wavelet transform EEG emotion recognition Swin Transformer temporal convolutional network
在线阅读 下载PDF
Cross-feature fusion speech emotion recognition based on attention mask residual network and Wav2vec 2.0
9
作者 Xiaoke Li Zufan Zhang 《Digital Communications and Networks》 2025年第5期1567-1577,共11页
Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the dev... Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the development of SER system.To address these issues,this paper proposes a framework that incorporates the Attentive Mask Residual Network(AM-ResNet)and the self-supervised learning model Wav2vec 2.0 to obtain AM-ResNet features and Wav2vec 2.0 features respectively,together with a cross-attention module to interact and fuse these two features.The AM-ResNet branch mainly consists of maximum amplitude difference detection,mask residual block,and an attention mechanism.Among them,the maximum amplitude difference detection and the mask residual block act on the pre-processing and the network,respectively,to reduce the impact of silent frames,and the attention mechanism assigns different weights to unvoiced and voiced speech to reduce redundant emotional information caused by unvoiced speech.In the Wav2vec 2.0 branch,this model is introduced as a feature extractor to obtain general speech features(Wav2vec 2.0 features)through pre-training with a large amount of unlabeled speech data,which can assist the SER task and cope with data sparsity problems.In the cross-attention module,AM-ResNet features and Wav2vec 2.0 features are interacted with and fused to obtain the cross-fused features,which are used to predict the final emotion.Furthermore,multi-label learning is also used to add ambiguous emotion utterances to deal with data limitations.Finally,experimental results illustrate the usefulness and superiority of our proposed framework over existing state-of-the-art approaches. 展开更多
关键词 Speech emotion recognition Residual network MASK ATTENTION Wav2vec 2.0 Cross-feature fusion
在线阅读 下载PDF
Improved MFCC Features and TWM Model for Speech Emotion Recognition
10
作者 Liyan Zhang Jiaxin Du +1 位作者 Shuang Chen Jiayan Li 《Journal of Harbin Institute of Technology(New Series)》 2025年第6期38-46,共9页
To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of stati... To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of static MFCC features to extract dynamic MFCC features,and constructs a hybrid model(TWM,TIM⁃NET(Temporal⁃aware Bi⁃directional Multi⁃scale Network)WGAN⁃GP(Wasserstein Generative Adversarial Network with Gradient Penalty)multi⁃head attention)combining multi⁃head attention mechanism and improved WGAN⁃GP on the basis of TIM⁃NET network.Among them,the multi⁃head attention mechanism not only effectively prevents gradient vanishing,but also allows for the construction of deeper networks that can capture long⁃range dependencies and learn from information at different time steps,improving the accuracy of the model;WGAN⁃GP solves the problem of insufficient sample size by improving the quality of speech sample generation.The experiment results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO⁃DB datasets. 展开更多
关键词 dynamic features speech emotion recognition multi⁃head attention mechanism generative adversarial networks
在线阅读 下载PDF
Support vector machines for emotion recognition in Chinese speech 被引量:8
11
作者 王治平 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2003年第4期307-310,共4页
Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional fe... Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional features construct a nonlinear problem in the input space, and SVMs based on nonlinear mapping can solve it more effectively than other linear methods. Multi class classification based on SVMs with a soft decision function is constructed to classify the four emotion situations. Compared with principal component analysis (PCA) method and modified PCA method, SVMs perform the best result in multi class discrimination by using nonlinear kernel mapping. 展开更多
关键词 speech signal emotion recognition support vector machines
在线阅读 下载PDF
A novel speech emotion recognition algorithm based on combination of emotion data field and ant colony search strategy 被引量:3
12
作者 查诚 陶华伟 +3 位作者 张昕然 周琳 赵力 杨平 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期158-163,共6页
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base... In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition. 展开更多
关键词 speech emotion recognition emotional data field ant colony search human-machine interaction
在线阅读 下载PDF
Cascaded projection of Gaussian mixture model for emotion recognition in speech and ECG signals 被引量:1
13
作者 黄程韦 吴迪 +5 位作者 张晓俊 肖仲喆 许宜申 季晶晶 陶智 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2015年第3期320-326,共7页
A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are... A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are generated using the marginal distribution model.Each sub-classifier is based on different feature sets.The cascaded structure is adopted to fuse the sub-classifiers dynamically to achieve sample adaptation ability.Secondly,the effectiveness of the proposed algorithm is verified on electrocardiogram emotional signal and speech emotional signal.Emotional data including fidgetiness,happiness and sadness is collected by induction experiments.Finally,the emotion feature extraction method is discussed,including heart rate variability, the chaotic electrocardiogram feature and utterance level static feature.The emotional feature reduction methods are studied, including principle component analysis,sequential forward selection, the Fisher discriminant ratio and maximal information coefficient.The experimental results show that the proposed classification algorithm can effectively improve recognition accuracy in two different scenarios. 展开更多
关键词 Gaussian mixture model emotion recognition sample adaptation emotion inducing
在线阅读 下载PDF
Auditory attention model based on Chirplet for cross-corpus speech emotion recognition 被引量:1
14
作者 张昕然 宋鹏 +2 位作者 查诚 陶华伟 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第4期402-407,共6页
To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for... To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set. 展开更多
关键词 speech emotion recognition selective attention mechanism spectrogram feature cross-corpus
在线阅读 下载PDF
Speech emotion recognition via discriminant-cascading dimensionality reduction 被引量:1
15
作者 王如刚 徐新洲 +3 位作者 黄程韦 吴尘 张昕然 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期151-157,共7页
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec... In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers. 展开更多
关键词 speech emotion recognition discriminant-cascading locality preserving projections DISCRIMINANTANALYSIS dimensionality reduction
在线阅读 下载PDF
Emotional speaker recognition based on prosody transformation 被引量:1
16
作者 宋鹏 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2011年第4期357-360,共4页
A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech.... A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model (GMM) and support vector regression (SVR) are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate (IR) is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one. 展开更多
关键词 emotion recognition speaker recognition F0 transformation duration modification
在线阅读 下载PDF
Speech emotion recognition using semi-supervised discriminant analysis
17
作者 徐新洲 黄程韦 +2 位作者 金赟 吴尘 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2014年第1期7-12,共6页
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp... Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA. 展开更多
关键词 speech emotion recognition speech emotion feature semi-supervised discriminant analysis dimensionality reduction
在线阅读 下载PDF
Novel feature fusion method for speech emotion recognition based on multiple kernel learning
18
作者 金赟 宋鹏 +1 位作者 郑文明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2013年第2期129-133,共5页
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl... In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method. 展开更多
关键词 speech emotion recognition multiple kemellearning feature fusion support vector machine
在线阅读 下载PDF
Dimensional emotion recognition in whispered speech signal based on cognitive performance evaluation
19
作者 吴晨健 黄程韦 陈虹 《Journal of Southeast University(English Edition)》 EI CAS 2015年第3期311-319,共9页
The cognitive performance-based dimensional emotion recognition in whispered speech is studied.First,the whispered speech emotion databases and data collection methods are compared, and the character of emotion expres... The cognitive performance-based dimensional emotion recognition in whispered speech is studied.First,the whispered speech emotion databases and data collection methods are compared, and the character of emotion expression in whispered speech is studied,especially the basic types of emotions.Secondly,the emotion features for whispered speech is analyzed,and by reviewing the latest references,the related valence features and the arousal features are provided. The effectiveness of valence and arousal features in whispered speech emotion classification is studied.Finally,the Gaussian mixture model is studied and applied to whispered speech emotion recognition. The cognitive performance is also considered in emotion recognition so that the recognition errors of whispered speech emotion can be corrected.Based on the cognitive scores,the emotion recognition results can be improved.The results show that the formant features are not significantly related to arousal dimension,while the short-term energy features are related to the emotion changes in arousal dimension.Using the cognitive scores,the recognition results can be improved. 展开更多
关键词 whispered speech emotion recognition emotion dimensional space
在线阅读 下载PDF
Semi-supervised Ladder Networks for Speech Emotion Recognition 被引量:9
20
作者 Jian-Hua Tao Jian Huang +2 位作者 Ya Li Zheng Lian Ming-Yue Niu 《International Journal of Automation and computing》 EI CSCD 2019年第4期437-448,共12页
As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed vario... As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised ladder networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods. 展开更多
关键词 SPEECH emotion recognition the LADDER network SEMI-SUPERVISED learning autoencoder REGULARIZATION
原文传递
上一页 1 2 87 下一页 到第
使用帮助 返回顶部