摘要
传统单模态人脸图像识别在复杂环境下容易受光照变化、目标遮挡等因素影响,识别精度和健壮性有待提高。声纹作为人类独有生物特征,与人脸图像特征形成了有效互补。提出一种融合声纹特征的人脸图像识别自监督算法,构建双分支特征提取网络,设计跨模态注意力融合机制,建立对比学习框架实现无监督多模态特征学习。实验结果表明,相比单模态方法,该算法识别准确率从85.2%提高至93.7%,在10%标注数据条件下仍拥有82.3%的识别准确率,验证了跨模态自监督学习的有效性。
Traditional single-mode face image recognition is easily affected by factors such as illumination change and target occlusion in complex environment,and its recognition accuracy and robustness need to be improved.Voiceprint,as a unique biological feature of human beings,effectively complements the features of human face images.This paper proposes a self-monitoring algorithm for face image recognition based on voiceprint features,constructs a double-branch feature extraction network,designs a cross-modal attention fusion mechanism,and establishes a comparative learning framework to realize unsupervised multi-modal feature learning.The experimental results show that the recognition accuracy of the algorithm is improved from 85.2%to 93.7%compared with the single-mode method,and it still reaches 82.3%under the condition of 10%labeled data,which verifies the effectiveness of cross-mode self-supervised learning.
作者
王梦仙
卢静涛
陈志泉
谢文娜
WANG Mengxian;LU Jingtao;CHEN Zhiquan;XIE Wenna(Xiamen Nanyang Vocational College,Xiamen 361000,China;University of the East,Manila 1008,Philippines)
出处
《电声技术》
2025年第10期46-48,共3页
Audio Engineering
基金
福建省教育厅中青年教师教育科研项目(科技类)课题(JAT242033)。
关键词
人脸图像识别
声纹特征
自监督学习
跨模态融合
注意力机制
face image recognition
voiceprint
self-supervised learning
cross-modal fusion
attention mechanism