Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence.Previous approaches typically generate a variety of partners to cover ...Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence.Previous approaches typically generate a variety of partners to cover human policies,and then either train a single universal agent or maintain multiple best-response(BR)policies for different partners.However,the first direction struggles with the stochastic and multimodal nature of human behaviors,and the second relies on costly few-shot adaptations during policy deployment,which is unbearable in real-world applications such as healthcare and autonomous driving.Recognizing that human partners can easily articulate their preferences or behavioral styles through natural languages(NLs)and make conventions beforehand,we propose a framework for Human-AI Coordination via Policy Generation from Language-guided Diffusion(Haland).Haland first trains BR policies for various partners using reinforcement learning,and then compresses policy parameters into a single latent diffusion model,conditioned on task-relevant language derived from their behaviors.Finally,the alignment between task-relevant and NLs is achieved to facilitate efficient human-AI coordination.Empirical evaluations across diverse cooperative environments demonstrate that Haland generates agents with significantly enhanced zero-shot coordination performance,utilizing only NL instructions from various partners,and outperforms existing methods by approximately 89.64%.展开更多
Language-guided fashion image editing is challenging,as fashion image editing is local and requires high precision,while natural language cannot provide precise visual information for guidance.In this paper,we propose...Language-guided fashion image editing is challenging,as fashion image editing is local and requires high precision,while natural language cannot provide precise visual information for guidance.In this paper,we propose LucIE,a novel unsupervised language-guided local image editing method for fashion images.LucIE adopts and modifies recent text-to-image synthesis network,DF-GAN,as its backbone.However,the synthesis backbone often changes the global structure of the input image,making local image editing impractical.To increase structural consistency between input and edited images,we propose Content-Preserving Fusion Module(CPFM).Different from existing fusion modules,CPFM prevents iterative refinement on visual feature maps and accumulates additive modifications on RGB maps.LucIE achieves local image editing explicitly with language-guided image segmentation and maskguided image blending while only using image and text pairs.Results on the DeepFashion dataset shows that LucIE achieves state-of-the-art results.Compared with previous methods,images generated by LucIE also exhibit fewer artifacts.We provide visualizations and perform ablation studies to validate LucIE and the CPFM.We also demonstrate and analyze limitations of LucIE,to provide a better understanding of LucIE.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62506159,62495093,U24A20324)the Natural Science Foundation of Jiangsu Province(Grant Nos.BK20241199,BK20243039)the AI&AI for Science Project of Nanjing University。
文摘Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence.Previous approaches typically generate a variety of partners to cover human policies,and then either train a single universal agent or maintain multiple best-response(BR)policies for different partners.However,the first direction struggles with the stochastic and multimodal nature of human behaviors,and the second relies on costly few-shot adaptations during policy deployment,which is unbearable in real-world applications such as healthcare and autonomous driving.Recognizing that human partners can easily articulate their preferences or behavioral styles through natural languages(NLs)and make conventions beforehand,we propose a framework for Human-AI Coordination via Policy Generation from Language-guided Diffusion(Haland).Haland first trains BR policies for various partners using reinforcement learning,and then compresses policy parameters into a single latent diffusion model,conditioned on task-relevant language derived from their behaviors.Finally,the alignment between task-relevant and NLs is achieved to facilitate efficient human-AI coordination.Empirical evaluations across diverse cooperative environments demonstrate that Haland generates agents with significantly enhanced zero-shot coordination performance,utilizing only NL instructions from various partners,and outperforms existing methods by approximately 89.64%.
文摘Language-guided fashion image editing is challenging,as fashion image editing is local and requires high precision,while natural language cannot provide precise visual information for guidance.In this paper,we propose LucIE,a novel unsupervised language-guided local image editing method for fashion images.LucIE adopts and modifies recent text-to-image synthesis network,DF-GAN,as its backbone.However,the synthesis backbone often changes the global structure of the input image,making local image editing impractical.To increase structural consistency between input and edited images,we propose Content-Preserving Fusion Module(CPFM).Different from existing fusion modules,CPFM prevents iterative refinement on visual feature maps and accumulates additive modifications on RGB maps.LucIE achieves local image editing explicitly with language-guided image segmentation and maskguided image blending while only using image and text pairs.Results on the DeepFashion dataset shows that LucIE achieves state-of-the-art results.Compared with previous methods,images generated by LucIE also exhibit fewer artifacts.We provide visualizations and perform ablation studies to validate LucIE and the CPFM.We also demonstrate and analyze limitations of LucIE,to provide a better understanding of LucIE.