期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Using Kinect for real-time emotion recognition via facial expressions 被引量:4
1
作者 qi-rong mao Xin-yu PAN +1 位作者 Yong-zhao ZHAN Xiang-jun SHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第4期272-282,共11页
Emotion recognition via facial expressions (ERFE) has attracted a great deal of interest with recent advances in artificial intelligence and pattern recognition. Most studies are based on 2D images, and their perfor... Emotion recognition via facial expressions (ERFE) has attracted a great deal of interest with recent advances in artificial intelligence and pattern recognition. Most studies are based on 2D images, and their performance is usually computationally expensive. In this paper, we propose a real-time emotion recognition approach based on both 2D and 3D facial expression features captured by Kinect sensors. To capture the deformation of the 3D mesh during facial expression, we combine the features of animation units (AUs) and feature point positions (FPPs) tracked by Kinect. A fusion algorithm based on improved emotional profiles (IEPs) arid maximum confidence is proposed to recognize emotions with these real-time facial expression features. Experiments on both an emotion dataset and a real-time video show the superior performance of our method. 展开更多
关键词 KINECT Emotion recognition Facial expression Real-time classification Fusion algorithm Supportvector machine (SVM)
原文传递
Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features 被引量:2
2
作者 qi-rong mao Xiao-lei ZHAO +1 位作者 Zheng-wei HUANG Yong-zhao ZHAN 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2013年第7期573-582,共10页
Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method com... Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features. 展开更多
关键词 Speech emotion recognition SPEAKER-INDEPENDENT Functional paralanguage Fusion algorithm Recognition accuracy
原文传递
Speech emotion recognition with unsupervised feature learning 被引量:1
3
作者 Zheng-wei HUANG Wen-tao XUE qi-rong mao 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第5期358-366,共9页
Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this p... Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms(including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network. 展开更多
关键词 Speech emotion recognition Unsupervised feature learning Neural network Affect computing
原文传递
基于动态环的Ad Hoc网络组播路由协议
4
作者 Yuan Zhou Guang-Sheng Li +2 位作者 Yong-Zhao Zhan qi-rong mao Yi-Bin Hou 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第C00期91-91,共1页
Ad Hoc网络是一个无基础结构的无线网络,由于架构这种网络非常方便,因此它可广泛应用于灾难救助、战场指挥、临时会议等场合。这些应用通常都有一个共同的特征,就是一到多或是多到多的数据传输。因此,组播路由协议在Ad Hoc网络中具有非... Ad Hoc网络是一个无基础结构的无线网络,由于架构这种网络非常方便,因此它可广泛应用于灾难救助、战场指挥、临时会议等场合。这些应用通常都有一个共同的特征,就是一到多或是多到多的数据传输。因此,组播路由协议在Ad Hoc网络中具有非常重要的作用。 展开更多
关键词 HOC网络 组播路由协议 无线网络 数据传输 架构 指挥 战场 动态 基础结构 灾难
原文传递
Affective rating ranking based on face images in arousal-valence dimensional space
5
作者 Guo-peng XU Hai-tang LU +1 位作者 Fei-fei ZHANG qi-rong mao 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第6期783-795,共13页
In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually ta... In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations.Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions. 展开更多
关键词 Ordinal ranking Dimensional affect recognition VALENCE AROUSAL Facial image processing
原文传递
Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder
6
作者 Jing-jing CHEN qi-rong mao +2 位作者 You-cai QIN Shuang-qing QIAN Zhi-shen ZHENG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第11期1639-1650,共12页
Much recent progress in monaural speech separation(MSS)has been achieved through a series of deep learning architectures based on autoencoders,which use an encoder to condense the input signal into compressed features... Much recent progress in monaural speech separation(MSS)has been achieved through a series of deep learning architectures based on autoencoders,which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest.However,these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech.In this study,we propose a novel weighted-factor autoencoder(WFAE)model for MSS,which introduces a regularization loss in the objective function to isolate one source without containing other sources.By incorporating a latent attention mechanism and a supervised source constructor in the separation layer,WFAE can learn source-specific generative factors and a set of discriminative features for each source,leading to MSS performance improvement.Experiments on benchmark datasets show that our approach outperforms the existing methods.In terms of three important metrics,WFAE has great success on a relatively challenging MSS case,i.e.,speaker-independent MSS. 展开更多
关键词 Speech separation Generative factors Autoencoder Deep learning
原文传递
NLWSNet:a weakly supervised network for visual sentiment analysis in mislabeled web images
7
作者 Luo-yang XUE qi-rong mao +1 位作者 Xiao-hua HUANG Jie CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第9期1321-1333,共13页
Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it ise... Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis.However,the annotation of large-scale datasets is expensive and time consuming.Instead,it iseasy to obtain weakly labeled web images from the Internet.However,noisy labels st.ill lead to seriously degraded performance when we use images directly from the web for training networks.To address this drawback,we propose an end-to-end weakly supervised learning network,which is robust to mislabeled web images.Specifically,the proposed attention module automatically eliminates the distraction of those samples with incorrect labels bv reducing their attention scores in the training process.On the other hand,the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach.Besides the process of feature learning,applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids.Quantitative and qualitative evaluations on well-and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods. 展开更多
关键词 Visual sentiment analysis Weakly supervised learning Mislabeled samples Significant sentiment regions
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部