期刊文献+
共找到131篇文章
< 1 2 7 >
每页显示 20 50 100
Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism
1
作者 Jun Li Chunyan Liang +1 位作者 Zhiguo Liu Fengpei Ge 《Computers, Materials & Continua》 2026年第3期2015-2039,共25页
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM... To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems. 展开更多
关键词 Speech emotion recognition adaptive acoustic mixup enhancement improved coordinate attention shuffle attention attention mechanism deep learning
在线阅读 下载PDF
MDGET-MER:Multi-Level Dynamic Gating and Emotion Transfer for Multi-Modal Emotion Recognition
2
作者 Musheng Chen Qiang Wen +2 位作者 Xiaohong Qiu Junhua Wu Wenqing Fu 《Computers, Materials & Continua》 2026年第3期872-893,共22页
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method... In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets. 展开更多
关键词 Multi-modal emotion recognition dynamic gating emotion transfer module cross-modal dynamic alignment noise robustness
在线阅读 下载PDF
Correction to DeepCNN:Spectro-temporal feature representation for speech emotion recognition
3
《CAAI Transactions on Intelligence Technology》 2025年第2期633-633,共1页
Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf shou... Saleem,N.,et al.:DeepCNN:Spectro-temporal feature representation for speech emotion recognition.CAAI Trans.Intell.Technol.8(2),401-417(2023).https://doi.org/10.1049/cit2.12233.The affiliation of Hafiz Tayyab Rauf should be[Independent Researcher,UK]. 展开更多
关键词 independent researcher speech emotion recognition deep cnn uk speech emotion recognitioncaai spectro temporal feature representation hafiz tayyab rauf
在线阅读 下载PDF
Comprehensive Review and Analysis on Facial Emotion Recognition:Performance Insights into Deep and Traditional Learning with Current Updates and Challenges
4
作者 Amjad Rehman Muhammad Mujahid +2 位作者 Alex Elyassih Bayan AlGhofaily Saeed Ali Omer Bahaj 《Computers, Materials & Continua》 SCIE EI 2025年第1期41-72,共32页
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi... In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research. 展开更多
关键词 Face emotion recognition deep learning hybrid learning CK+ facial images machine learning technological development
在线阅读 下载PDF
Occluded Gait Emotion Recognition Based on Multi-Scale Suppression Graph Convolutional Network
5
作者 Yuxiang Zou Ning He +2 位作者 Jiwu Sun Xunrui Huang Wenhua Wang 《Computers, Materials & Continua》 SCIE EI 2025年第1期1255-1276,共22页
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac... In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods. 展开更多
关键词 KNN interpolation multi-scale temporal convolution suppression graph convolutional network gait emotion recognition human skeleton
在线阅读 下载PDF
Electroencephalogram-based emotion recognition:a comparative analysis of supervised machine learning algorithms
6
作者 Anagha Prakash Alwin Poulose 《Data Science and Management》 2025年第3期342-360,共19页
Emotion recognition from electroencephalogram(EEG)signals has garnered significant attention owing to its potential applications in affective computing,human-computer interaction,and mental health monitoring.This pape... Emotion recognition from electroencephalogram(EEG)signals has garnered significant attention owing to its potential applications in affective computing,human-computer interaction,and mental health monitoring.This paper presents a comparative analysis of different machine learning methods for emotion recognition using EEG data.The objective of this study was to identify the most effective algorithm for accurately classifying emotional states using EEG signals.The EEG brainwave dataset:Feeling emotions dataset was used to evaluate the performance of various machine-learning techniques.Multiple machine learning techniques,namely logistic regression(LR),support vector machine(SVM),Gaussian Naive Bayes(GNB),and decision tree(DT),and ensemble models,namely random forest(RF),AdaBoost,LightGBM,XGBoost,and CatBoost,were trained and evaluated.Five-fold cross-validation and dimension reduction techniques,such as principal component analysis,tdistributed stochastic neighbor embedding,and linear discriminant analysis,were performed for all models.The least-performing model,GNB,showed substantially increased performance after dimension reduction.Performance metrics such as accuracy,precision,recall,F1-score,and receiver operating characteristic curves are employed to assess the effectiveness of each approach.This study focuses on the implications of using various machine learning algorithms for EEG-based emotion recognition.This pursuit can improve our understanding of emotions and their underlying neural mechanisms. 展开更多
关键词 emotion recognition Electroencephalogram(EEG) Machine learning models CLASSIFICATION Affective computing Mental health monitoring Human-computer interaction Brainwave dataset
在线阅读 下载PDF
Cross-feature fusion speech emotion recognition based on attention mask residual network and Wav2vec 2.0
7
作者 Xiaoke Li Zufan Zhang 《Digital Communications and Networks》 2025年第5期1567-1577,共11页
Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the dev... Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the development of SER system.To address these issues,this paper proposes a framework that incorporates the Attentive Mask Residual Network(AM-ResNet)and the self-supervised learning model Wav2vec 2.0 to obtain AM-ResNet features and Wav2vec 2.0 features respectively,together with a cross-attention module to interact and fuse these two features.The AM-ResNet branch mainly consists of maximum amplitude difference detection,mask residual block,and an attention mechanism.Among them,the maximum amplitude difference detection and the mask residual block act on the pre-processing and the network,respectively,to reduce the impact of silent frames,and the attention mechanism assigns different weights to unvoiced and voiced speech to reduce redundant emotional information caused by unvoiced speech.In the Wav2vec 2.0 branch,this model is introduced as a feature extractor to obtain general speech features(Wav2vec 2.0 features)through pre-training with a large amount of unlabeled speech data,which can assist the SER task and cope with data sparsity problems.In the cross-attention module,AM-ResNet features and Wav2vec 2.0 features are interacted with and fused to obtain the cross-fused features,which are used to predict the final emotion.Furthermore,multi-label learning is also used to add ambiguous emotion utterances to deal with data limitations.Finally,experimental results illustrate the usefulness and superiority of our proposed framework over existing state-of-the-art approaches. 展开更多
关键词 Speech emotion recognition Residual network MASK ATTENTION Wav2vec 2.0 Cross-feature fusion
在线阅读 下载PDF
Dual-Task Contrastive Meta-Learning for Few-Shot Cross-Domain Emotion Recognition
8
作者 Yujiao Tang Yadong Wu +2 位作者 Yuanmei He Jilin Liu Weihan Zhang 《Computers, Materials & Continua》 2025年第2期2331-2352,共22页
Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion... Emotion recognition plays a crucial role in various fields and is a key task in natural language processing (NLP). The objective is to identify and interpret emotional expressions in text. However, traditional emotion recognition approaches often struggle in few-shot cross-domain scenarios due to their limited capacity to generalize semantic features across different domains. Additionally, these methods face challenges in accurately capturing complex emotional states, particularly those that are subtle or implicit. To overcome these limitations, we introduce a novel approach called Dual-Task Contrastive Meta-Learning (DTCML). This method combines meta-learning and contrastive learning to improve emotion recognition. Meta-learning enhances the model’s ability to generalize to new emotional tasks, while instance contrastive learning further refines the model by distinguishing unique features within each category, enabling it to better differentiate complex emotional expressions. Prototype contrastive learning, in turn, helps the model address the semantic complexity of emotions across different domains, enabling the model to learn fine-grained emotions expression. By leveraging dual tasks, DTCML learns from two domains simultaneously, the model is encouraged to learn more diverse and generalizable emotions features, thereby improving its cross-domain adaptability and robustness, and enhancing its generalization ability. We evaluated the performance of DTCML across four cross-domain settings, and the results show that our method outperforms the best baseline by 5.88%, 12.04%, 8.49%, and 8.40% in terms of accuracy. 展开更多
关键词 Contrastive learning emotion recognition cross-domain learning DUAL-TASK META-LEARNING
在线阅读 下载PDF
EEG Scalogram Analysis in Emotion Recognition:A Swin Transformer and TCN-Based Approach
9
作者 Selime Tuba Pesen Mehmet Ali Altuncu 《Computers, Materials & Continua》 2025年第9期5597-5611,共15页
EEG signals are widely used in emotion recognition due to their ability to reflect involuntary physiological responses.However,the high dimensionality of EEG signals and their continuous variability in the time-freque... EEG signals are widely used in emotion recognition due to their ability to reflect involuntary physiological responses.However,the high dimensionality of EEG signals and their continuous variability in the time-frequency plane make their analysis challenging.Therefore,advanced deep learning methods are needed to extract meaningful features and improve classification performance.This study proposes a hybrid model that integrates the Swin Transformer and Temporal Convolutional Network(TCN)mechanisms for EEG-based emotion recognition.EEG signals are first converted into scalogram images using Continuous Wavelet Transform(CWT),and classification is performed on these images.Swin Transformer is used to extract spatial features in scalogram images,and the TCN method is used to learn long-term dependencies.In addition,attention mechanisms are integrated to highlight the essential features extracted from both models.The effectiveness of the proposed model has been tested on the SEED dataset,widely used in the field of emotion recognition,and it has consistently achieved high performance across all emotional classes,with accuracy,precision,recall,and F1-score values of 97.53%,97.54%,97.53%,and 97.54%,respectively.Compared to traditional transfer learning models,the proposed approach achieved an accuracy increase of 1.43%over ResNet-101,1.81%over DenseNet-201,and 2.44%over VGG-19.In addition,the proposed model outperformed many recent CNN,RNN,and Transformer-based methods reported in the literature. 展开更多
关键词 Continuous wavelet transform EEG emotion recognition Swin Transformer temporal convolutional network
在线阅读 下载PDF
Deep Learning⁃Based Speech Emotion Recognition: Leveraging Diverse Datasets and Augmentation Techniques for Robust Modeling
10
作者 Ayush Porwal Praveen Kumar Tyagi +1 位作者 Ajay Sharma Dheeraj Kumar Agarwal 《Journal of Harbin Institute of Technology(New Series)》 2025年第3期54-65,共12页
In recent years,Speech Emotion Recognition(SER)has developed into an essential instrument for interpreting human emotions from auditory data.The proposed research focuses on the development of a SER system employing d... In recent years,Speech Emotion Recognition(SER)has developed into an essential instrument for interpreting human emotions from auditory data.The proposed research focuses on the development of a SER system employing deep learning and multiple datasets containing samples of emotive speech.The primary objective of this research endeavor is to investigate the utilization of Convolutional Neural Networks(CNNs)in the process of sound feature extraction.Stretching,pitch manipulation,and noise injection are a few of the techniques utilized in this study to improve the data quality.Feature extraction methods including Zero Crossing Rate,Chroma_stft,Mel⁃scale Frequency Cepstral Coefficients(MFCC),Root Mean Square(RMS),and Mel⁃Spectogram are used to train a model.By using these techniques,audio signals can be transformed into recognized features that can be utilized to train the model.Ultimately,the study produces a thorough evaluation of the models performance.When this method was applied,the model achieved an impressive accuracy of 94.57%on the test dataset.The proposed work was also validated on the EMO⁃BD and IEMOCAP datasets.These consist of further data augmentation,feature engineering,and hyperparameter optimization.By following these development paths,SER systems will be able to be implemented in real⁃world scenarios with greater accuracy and resilience. 展开更多
关键词 voice signal emotion recognition deep learning CNN
在线阅读 下载PDF
Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions
11
作者 A-Seong Moon Seungyeon Jeong +3 位作者 Donghee Kim Mohd Asyraf Zulkifley Bong-Soo Sohn Jaesung Lee 《Computers, Materials & Continua》 2025年第11期2851-2872,共22页
Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ... Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025). 展开更多
关键词 Multimodal learning emotion recognition cross-modal attention robust representation learning
在线阅读 下载PDF
Segmentwise Multilayer Perceptrons for Speech Emotion Recognition
12
作者 Ziying Zhang Changzheng Liu 《国际计算机前沿大会会议论文集》 2025年第1期203-213,共11页
With the increasing popularity of mobile internet devices,speech emotion recognition has become a convenient and valuable means of human-computer interaction.The performance of speech emotion recognition depends on th... With the increasing popularity of mobile internet devices,speech emotion recognition has become a convenient and valuable means of human-computer interaction.The performance of speech emotion recognition depends on the discriminating and emotion-related utterance-level representations extracted from speech.Moreover,sufficient data are required to model the relationship between emotional states and speech.Mainstream emotion recognition methods cannot avoid the influence of the silence period in speech,and environmental noise significantly affects the recognition performance.This study intends to supplement the silence periods with removed speech information and applies segmentwise multilayer perceptrons to enhance the utterance-level representation aggregation.In addition,improved semisupervised learning is employed to overcome the prob-lem of data scarcity.Particular experiments are conducted to evaluate the proposed method on the IEMOCAP corpus,which reveals that it achieves 68.0%weighted accuracy and 68.8%unweighted accuracy in four emotion classifications.The experimental results demonstrate that the proposed method aggregates utterance-level more effectively and that semisupervised learning enhances the performance of our method. 展开更多
关键词 speech emotion recognition segmentwise multilayer perceptron semisupervised learning emotion classification
原文传递
Does problematic mobile phone use affect facial emotion recognition?
13
作者 Bowei Go Xianli An 《Journal of Psychology in Africa》 2025年第4期523-533,共11页
This study investigated the impact of problematic mobile phone use(PMPU)on emotion recognition.The PMPU levels of 150 participants were measured using the standardized SAS-SV scale.Based on the SAS-SV cutoff scores,pa... This study investigated the impact of problematic mobile phone use(PMPU)on emotion recognition.The PMPU levels of 150 participants were measured using the standardized SAS-SV scale.Based on the SAS-SV cutoff scores,participants were divided into PMPU and Control groups.These participants completed two emotion recognition experiments involving facial emotion stimuli that had been manipulated to varying emotional intensities using Morph software.Experiment 1(n=75)assessed differences in facial emotion detection accuracy.Experiment 2(n=75),based on signal detection theory,examined differences in hit and false alarm rates across emotional expressions.The results showed that PMPU users demonstrated higher recognition accuracy rates for disgust faces but lower accuracy for happy faces.This indicates a tendency among PMPU users to prioritize specific negative emotions and may have impaired perception of positive emotions.Practically,incorporating diverse emotional stimuli into PMPU intervention may help alleviate the negative emotional focus bias associated with excessive mobile devices use. 展开更多
关键词 problematic mobile phone use emotion recognition facial emotion basic emotion
在线阅读 下载PDF
Improved MFCC Features and TWM Model for Speech Emotion Recognition
14
作者 Liyan Zhang Jiaxin Du +1 位作者 Shuang Chen Jiayan Li 《Journal of Harbin Institute of Technology(New Series)》 2025年第6期38-46,共9页
To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of stati... To solve the problem that traditional Mel Frequency Cepstral Coefficient(MFCC)features cannot fully represent dynamic speech features,this paper introduces first⁃order and second⁃order difference on the basis of static MFCC features to extract dynamic MFCC features,and constructs a hybrid model(TWM,TIM⁃NET(Temporal⁃aware Bi⁃directional Multi⁃scale Network)WGAN⁃GP(Wasserstein Generative Adversarial Network with Gradient Penalty)multi⁃head attention)combining multi⁃head attention mechanism and improved WGAN⁃GP on the basis of TIM⁃NET network.Among them,the multi⁃head attention mechanism not only effectively prevents gradient vanishing,but also allows for the construction of deeper networks that can capture long⁃range dependencies and learn from information at different time steps,improving the accuracy of the model;WGAN⁃GP solves the problem of insufficient sample size by improving the quality of speech sample generation.The experiment results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO⁃DB datasets. 展开更多
关键词 dynamic features speech emotion recognition multi⁃head attention mechanism generative adversarial networks
在线阅读 下载PDF
Support vector machines for emotion recognition in Chinese speech 被引量:8
15
作者 王治平 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2003年第4期307-310,共4页
Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional fe... Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional features construct a nonlinear problem in the input space, and SVMs based on nonlinear mapping can solve it more effectively than other linear methods. Multi class classification based on SVMs with a soft decision function is constructed to classify the four emotion situations. Compared with principal component analysis (PCA) method and modified PCA method, SVMs perform the best result in multi class discrimination by using nonlinear kernel mapping. 展开更多
关键词 speech signal emotion recognition support vector machines
在线阅读 下载PDF
A novel speech emotion recognition algorithm based on combination of emotion data field and ant colony search strategy 被引量:3
16
作者 查诚 陶华伟 +3 位作者 张昕然 周琳 赵力 杨平 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期158-163,共6页
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base... In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition. 展开更多
关键词 speech emotion recognition emotional data field ant colony search human-machine interaction
在线阅读 下载PDF
Cascaded projection of Gaussian mixture model for emotion recognition in speech and ECG signals 被引量:1
17
作者 黄程韦 吴迪 +5 位作者 张晓俊 肖仲喆 许宜申 季晶晶 陶智 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2015年第3期320-326,共7页
A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are... A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are generated using the marginal distribution model.Each sub-classifier is based on different feature sets.The cascaded structure is adopted to fuse the sub-classifiers dynamically to achieve sample adaptation ability.Secondly,the effectiveness of the proposed algorithm is verified on electrocardiogram emotional signal and speech emotional signal.Emotional data including fidgetiness,happiness and sadness is collected by induction experiments.Finally,the emotion feature extraction method is discussed,including heart rate variability, the chaotic electrocardiogram feature and utterance level static feature.The emotional feature reduction methods are studied, including principle component analysis,sequential forward selection, the Fisher discriminant ratio and maximal information coefficient.The experimental results show that the proposed classification algorithm can effectively improve recognition accuracy in two different scenarios. 展开更多
关键词 Gaussian mixture model emotion recognition sample adaptation emotion inducing
在线阅读 下载PDF
Auditory attention model based on Chirplet for cross-corpus speech emotion recognition 被引量:1
18
作者 张昕然 宋鹏 +2 位作者 查诚 陶华伟 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第4期402-407,共6页
To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for... To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set. 展开更多
关键词 speech emotion recognition selective attention mechanism spectrogram feature cross-corpus
在线阅读 下载PDF
Speech emotion recognition via discriminant-cascading dimensionality reduction 被引量:1
19
作者 王如刚 徐新洲 +3 位作者 黄程韦 吴尘 张昕然 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期151-157,共7页
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec... In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers. 展开更多
关键词 speech emotion recognition discriminant-cascading locality preserving projections DISCRIMINANTANALYSIS dimensionality reduction
在线阅读 下载PDF
Novel feature fusion method for speech emotion recognition based on multiple kernel learning
20
作者 金赟 宋鹏 +1 位作者 郑文明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2013年第2期129-133,共5页
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl... In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method. 展开更多
关键词 speech emotion recognition multiple kemellearning feature fusion support vector machine
在线阅读 下载PDF
上一页 1 2 7 下一页 到第
使用帮助 返回顶部