期刊文献+

采用线性注意力机制的语音驱动三维人脸动画技术

Speech-driven 3D Face Animation Using Linear Attention Mechanism
在线阅读 下载PDF
导出
摘要 语音驱动三维人脸动画技术,旨在通过输入语音,驱动三维人脸模型生成视觉对应的人脸表情动画.当前的常用方法是基于Transformer结构以自回归形式完成人脸动画生成,但是这些方法在面对长语音生成动画时的二次运算复杂度限制了其性能瓶颈,在数据集稀疏情况下的过拟合问题也使得其在生成动画的准确性以及泛化性上存在不足.为了解决以上问题,本文提出一种基于线性注意力的语音驱动三维人脸动画方法.该方法采用一种新的端到端网络模型,通过语音自监督表示学习构建编码器提取语音特征,并利用线性注意力变体的结构RWKV构建人脸表情映射解码模块生成人脸动画.实验结果表明,本文的方法在人脸表情生成的准确度和时效性上都优于目前的相关方法,三维人脸网格顶点平均误差在标准化条件下上较sota方法降低了0.15,单帧人脸预测时延上也比基于传统Transformer的方法快了4倍左右. Speech-driven 3D face animation technology aims to drive the 3D face model to generate visually corresponding face expression animation by inputting speech.The current common method is based on the Transformer structure to complete the face animation generation in the form of autoregression,but these methods in the face of long speech to generate animation in the secondary operation complexity limits its performance bottleneck,in the case of sparse datasets in the overfitting problem also makes it in the generation of animation accuracy as well as the generalisation of the shortcomings.In order to solve the above problems,this paper proposes a voice-driven 3D face animation method based on linear attention.The method adopts a new end-to-end network model,constructs an encoder to extract speech features through speech self-supervised representation learning,and constructs a face expression mapping decoding module to generate face animation using the structure of linear attention variant RWKV.The experimental results show that the method in this paper is better than the current related methods in the accuracy and timeliness of face expression generation,and the average error of 3D face mesh vertices is reduced by 0.15mm under the standardised condition compared with the sota method,and the delay of single-frame face prediction is also about 4 times faster than the traditional Transformer-based method.
作者 童程凯 叶阳 TONG Chengkai;YE Yang(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
出处 《小型微型计算机系统》 北大核心 2025年第6期1400-1408,共9页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(62072405)资助.
关键词 语音驱动 自监督 线性注意力 人脸动画 speech-driven self-supervised linear attention face animation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部