Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial str...Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial structure and the complex mapping between texts and images.Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions.In this study,we propose a novel generation approach,named FaceCLIP,to tackle these problems.The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions.With strong semantic priors from multi-modal textual and visual cues,the proposed method effectively disentangles facial attributes,enabling attribute editing and semantic reasoning.To facilitate text-toexpression generation,we build a new dataset called the FET dataset,which contains facial expression images and corresponding textual descriptions.Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.展开更多
As artificial intelligence(AI)extends humanoid robots into social domains like education,healthcare,and home,the need for emotional interaction is increasing.Facial expressions,conveying 55%of emotional information,ar...As artificial intelligence(AI)extends humanoid robots into social domains like education,healthcare,and home,the need for emotional interaction is increasing.Facial expressions,conveying 55%of emotional information,are key to emotional bonding,making realistic-faced humanoid robots—face robots—increasingly essential.This article reviews AI-driven expression interaction technologies in face robots.It first examines the hardware architecture of face robots,then analyzes a“perception-reasoning-generation”framework by comparing traditional and advanced approaches.Traditional methods rely on visual and speech-based emotion recognition,along with discrete or dimensional emotion models,to drive expression generation(e.g.,facial movements,eye contact,and lip synchronization)through affective computing.In contrast,advanced approaches leverage multimodal fusion,large language model(LLM)or multimodal large language model(MLLM)-based emotion reasoning,and agent-based planning,memory,and tool use to enhance adaptability,realism,and emotional intelligence.The article also discusses potential application areas,current challenges,and future research directions for face robots.By integrating progress across hardware,algorithms,applications,and open issues,this review lays a comprehensive foundation for the development of empathetic,socially adaptive face robots suitable for complex human environments.展开更多
基金supported by the Natural Science Foundation of Shandong Province of China under Grant No.ZR2023MF041the National Natural Science Foundation of China under Grant No.62072469+1 种基金Shandong Data Open Innovative Application Laboratory,the Spanish Ministry of Economy and Competitiveness(MINECO)the European Regional Development Fund(ERDF)under Project No.PID2020-120611RBI00/AEI/10.13039/501100011033.
文摘Facial expression generation from pure textual descriptions is widely applied in human-computer interaction,computer-aided design,assisted education,etc.However,this task is challenging due to the intricate facial structure and the complex mapping between texts and images.Existing methods face limitations in generating high-resolution images or capturing diverse facial expressions.In this study,we propose a novel generation approach,named FaceCLIP,to tackle these problems.The proposed method utilizes a CLIP-based multi-stage generative adversarial model to produce vivid facial expressions with high resolutions.With strong semantic priors from multi-modal textual and visual cues,the proposed method effectively disentangles facial attributes,enabling attribute editing and semantic reasoning.To facilitate text-toexpression generation,we build a new dataset called the FET dataset,which contains facial expression images and corresponding textual descriptions.Experiments on the dataset demonstrate improved image quality and semantic consistency compared with state-of-the-art methods.
基金supported by the International Cooperation Program of the Natural Science Foundation of China(Grant No.52261135542)the National Natural Science Foundation of China(Grant No.52305074)the Zhejiang Provincial Natural Science Foundation(Grant No.LZYQ25E050001)。
文摘As artificial intelligence(AI)extends humanoid robots into social domains like education,healthcare,and home,the need for emotional interaction is increasing.Facial expressions,conveying 55%of emotional information,are key to emotional bonding,making realistic-faced humanoid robots—face robots—increasingly essential.This article reviews AI-driven expression interaction technologies in face robots.It first examines the hardware architecture of face robots,then analyzes a“perception-reasoning-generation”framework by comparing traditional and advanced approaches.Traditional methods rely on visual and speech-based emotion recognition,along with discrete or dimensional emotion models,to drive expression generation(e.g.,facial movements,eye contact,and lip synchronization)through affective computing.In contrast,advanced approaches leverage multimodal fusion,large language model(LLM)or multimodal large language model(MLLM)-based emotion reasoning,and agent-based planning,memory,and tool use to enhance adaptability,realism,and emotional intelligence.The article also discusses potential application areas,current challenges,and future research directions for face robots.By integrating progress across hardware,algorithms,applications,and open issues,this review lays a comprehensive foundation for the development of empathetic,socially adaptive face robots suitable for complex human environments.