采用线性注意力机制的语音驱动三维人脸动画技术

Speech-driven 3D Face Animation Using Linear Attention Mechanism

下载PDF

导出

摘要语音驱动三维人脸动画技术,旨在通过输入语音,驱动三维人脸模型生成视觉对应的人脸表情动画.当前的常用方法是基于Transformer结构以自回归形式完成人脸动画生成,但是这些方法在面对长语音生成动画时的二次运算复杂度限制了其性能瓶颈,在数据集稀疏情况下的过拟合问题也使得其在生成动画的准确性以及泛化性上存在不足.为了解决以上问题,本文提出一种基于线性注意力的语音驱动三维人脸动画方法.该方法采用一种新的端到端网络模型,通过语音自监督表示学习构建编码器提取语音特征,并利用线性注意力变体的结构RWKV构建人脸表情映射解码模块生成人脸动画.实验结果表明,本文的方法在人脸表情生成的准确度和时效性上都优于目前的相关方法,三维人脸网格顶点平均误差在标准化条件下上较sota方法降低了0.15,单帧人脸预测时延上也比基于传统Transformer的方法快了4倍左右. Speech-driven 3D face animation technology aims to drive the 3D face model to generate visually corresponding face expression animation by inputting speech.The current common method is based on the Transformer structure to complete the face animation generation in the form of autoregression,but these methods in the face of long speech to generate animation in the secondary operation complexity limits its performance bottleneck,in the case of sparse datasets in the overfitting problem also makes it in the generation of animation accuracy as well as the generalisation of the shortcomings.In order to solve the above problems,this paper proposes a voice-driven 3D face animation method based on linear attention.The method adopts a new end-to-end network model,constructs an encoder to extract speech features through speech self-supervised representation learning,and constructs a face expression mapping decoding module to generate face animation using the structure of linear attention variant RWKV.The experimental results show that the method in this paper is better than the current related methods in the accuracy and timeliness of face expression generation,and the average error of 3D face mesh vertices is reduced by 0.15mm under the standardised condition compared with the sota method,and the delay of single-frame face prediction is also about 4 times faster than the traditional Transformer-based method.

作者童程凯叶阳 TONG Chengkai;YE Yang(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区浙江工业大学计算机科学与技术学院

出处《小型微型计算机系统》北大核心 2025年第6期1400-1408,共9页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(62072405)资助.

关键词语音驱动自监督线性注意力人脸动画 speech-driven self-supervised linear attention face animation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1郭杰,张晓静.《哪吒2》:打造文化传承发展新范式[J].党的生活(河南),2025(7):55-56.
2叶林.现代舞台剧中的虚拟道具与实体道具的结合应用[J].艺术时尚,2024(15):0052-0053.
3刘龙,李浩生,张梦璇,杜莹,常雅淇,张文博.基于深度学习的人脸动画驱动方法综述[J].西安电子科技大学学报,2025,52(2):57-84. 被引量：1
4孙恩威,徐春.基于联合微调CLIP和Fastspeech2的盲文图像-语音生成[J].计算机时代,2025(5):28-34.
5王磊,杨阳,张颖,李庆,胡小羽,胡康.AI手语在气象科普中的应用--以《梅雨知时节》视频创作为例[J].广播电视信息,2025,32(5):38-43. 被引量：1
6王远来,白宇,廉鹏.基于交互行为语义模式增强的ID推荐方法[J].清华大学学报(自然科学版),2025,65(5):844-853.
7梁燕颜.云计算平台下大数据存储与处理技术优化[J].计算机产品与流通,2025(3):59-61.
8赛阳光,胡馨月.基于三维水墨动画的墨竹影像设计研究[J].收藏与投资,2025,16(4):85-87.
9马子原.基于无线通信技术的复杂环境下灭火救援指挥[J].今日消防,2025,10(4):38-40.
10马千里,马永辉,钟诚,李晓,廉重,刘志权.2.5D封装关键技术的研究进展[J].电子与封装,2025,25(5):78-86. 被引量：1

小型微型计算机系统

2025年第6期

浏览历史

内容加载中请稍等...

采用线性注意力机制的语音驱动三维人脸动画技术

相关作者

相关机构

相关主题

浏览历史