Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address th...Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address this,we enhanced the Wav2Lip model in this study and trained it on a high⁃resolution video dataset produced in our laboratory.Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model,while maintaining its real⁃time performance and accurate lip⁃sync.We implemented the improved Wav2Lip model in a government interface application,generating a government digital human.Testing revealed that this government digital human can interact seamlessly with users in real⁃time,delivering clear visuals and synthesized speech that closely resembles a human voice.展开更多
基金Sponsored by Collaborative Education Projects Between Industry and Academia by Ministry of Education(Grant No.230801065261444)Humanities and Social Sciences Pre Research Fund Project of Zhejiang University of Technology(Grant No.SKY-ZX-20220207).
文摘Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address this,we enhanced the Wav2Lip model in this study and trained it on a high⁃resolution video dataset produced in our laboratory.Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model,while maintaining its real⁃time performance and accurate lip⁃sync.We implemented the improved Wav2Lip model in a government interface application,generating a government digital human.Testing revealed that this government digital human can interact seamlessly with users in real⁃time,delivering clear visuals and synthesized speech that closely resembles a human voice.