本文结合了Node2vec和GCN这两种方法,先利用Node2vec方法得到初步的图嵌入,之后将其作为输入利用GCN进一步更新图嵌入矩阵。本文选择在维基数据集上进行节点分类任务,比较了结合前后方法的表现,验证了其有效性。In this paper, we integ...本文结合了Node2vec和GCN这两种方法,先利用Node2vec方法得到初步的图嵌入,之后将其作为输入利用GCN进一步更新图嵌入矩阵。本文选择在维基数据集上进行节点分类任务,比较了结合前后方法的表现,验证了其有效性。In this paper, we integrate the Node2vec and GCN methods. Initially, the Node2vec method is employed to obtain preliminary graph embeddings, which are then used as input to further update the graph embedding matrix through GCN. The study selects the Wikipedia dataset for node classification tasks, comparing the performance of the methods before and after integration to validate their effectiveness.展开更多
Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the dev...Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the development of SER system.To address these issues,this paper proposes a framework that incorporates the Attentive Mask Residual Network(AM-ResNet)and the self-supervised learning model Wav2vec 2.0 to obtain AM-ResNet features and Wav2vec 2.0 features respectively,together with a cross-attention module to interact and fuse these two features.The AM-ResNet branch mainly consists of maximum amplitude difference detection,mask residual block,and an attention mechanism.Among them,the maximum amplitude difference detection and the mask residual block act on the pre-processing and the network,respectively,to reduce the impact of silent frames,and the attention mechanism assigns different weights to unvoiced and voiced speech to reduce redundant emotional information caused by unvoiced speech.In the Wav2vec 2.0 branch,this model is introduced as a feature extractor to obtain general speech features(Wav2vec 2.0 features)through pre-training with a large amount of unlabeled speech data,which can assist the SER task and cope with data sparsity problems.In the cross-attention module,AM-ResNet features and Wav2vec 2.0 features are interacted with and fused to obtain the cross-fused features,which are used to predict the final emotion.Furthermore,multi-label learning is also used to add ambiguous emotion utterances to deal with data limitations.Finally,experimental results illustrate the usefulness and superiority of our proposed framework over existing state-of-the-art approaches.展开更多
文摘本文结合了Node2vec和GCN这两种方法,先利用Node2vec方法得到初步的图嵌入,之后将其作为输入利用GCN进一步更新图嵌入矩阵。本文选择在维基数据集上进行节点分类任务,比较了结合前后方法的表现,验证了其有效性。In this paper, we integrate the Node2vec and GCN methods. Initially, the Node2vec method is employed to obtain preliminary graph embeddings, which are then used as input to further update the graph embedding matrix through GCN. The study selects the Wikipedia dataset for node classification tasks, comparing the performance of the methods before and after integration to validate their effectiveness.
基金supported by Chongqing University of Posts and Telecommunications Ph.D.Innovative Talents Project(Grant No.BYJS202106)Chongqing Postgraduate Research Innovation Project(Grant No.CYB21203).
文摘Speech Emotion Recognition(SER)has received widespread attention as a crucial way for understanding human emotional states.However,the impact of irrelevant information on speech signals and data sparsity limit the development of SER system.To address these issues,this paper proposes a framework that incorporates the Attentive Mask Residual Network(AM-ResNet)and the self-supervised learning model Wav2vec 2.0 to obtain AM-ResNet features and Wav2vec 2.0 features respectively,together with a cross-attention module to interact and fuse these two features.The AM-ResNet branch mainly consists of maximum amplitude difference detection,mask residual block,and an attention mechanism.Among them,the maximum amplitude difference detection and the mask residual block act on the pre-processing and the network,respectively,to reduce the impact of silent frames,and the attention mechanism assigns different weights to unvoiced and voiced speech to reduce redundant emotional information caused by unvoiced speech.In the Wav2vec 2.0 branch,this model is introduced as a feature extractor to obtain general speech features(Wav2vec 2.0 features)through pre-training with a large amount of unlabeled speech data,which can assist the SER task and cope with data sparsity problems.In the cross-attention module,AM-ResNet features and Wav2vec 2.0 features are interacted with and fused to obtain the cross-fused features,which are used to predict the final emotion.Furthermore,multi-label learning is also used to add ambiguous emotion utterances to deal with data limitations.Finally,experimental results illustrate the usefulness and superiority of our proposed framework over existing state-of-the-art approaches.