In this paper,we present TexPro,a novel method for high-fidelity material generation for input 3D meshes given text prompts.Unlike existing text-conditioned texture generation methods that typically generate RGB textu...In this paper,we present TexPro,a novel method for high-fidelity material generation for input 3D meshes given text prompts.Unlike existing text-conditioned texture generation methods that typically generate RGB textures with baked lighting,TexPro is able to produce diverse texture maps via procedural material modeling,which enables physically-based rendering,relighting,and additional benefits inherent to procedural materials.Specifically,we first generate multi-view reference images given the input textual prompt by employing the latest text-to-image model.We then derive texture maps through rendering-based optimization with recent differentiable procedural materials.To this end,we design several techniques to handle the misalignment between the generated multi-view images and 3D meshes,and introduce a novel material agent that enhances material classification and matching by exploring both part-level understanding and object-aware material reasoning.Experiments demonstrate the superiority of the proposed method over existing SOTAs,and its capability of relighting.展开更多
Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial d...Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial details remains a persistent challenge.Moreover,casually captured input,involving both head poses and camera movements,introduces additional difficulties to existing methods of head avatar reconstruction.To address the challenge posed by video data captured with camera motion,we propose a novel method,AvatarWild,for reconstructing head avatars from monocular videos taken by consumer devices.Notably,our approach decouples the camera pose and head pose,allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints.To enhance the visual quality of the reconstructed facial avatar,we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency.Our method demonstrates superior performance compared to existing approaches,as evidenced by reconstruction and animation results on both multi-view and single-view datasets.Remarkably,our approach stands out by exclusively relying on video data captured by portable devices,such as smartphones.This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.展开更多
基金supported by the National Natural Science Foundation of China(No.62441222)the Information Technology Center and State Key Lab of CAD&CG,Zhejiang University。
文摘In this paper,we present TexPro,a novel method for high-fidelity material generation for input 3D meshes given text prompts.Unlike existing text-conditioned texture generation methods that typically generate RGB textures with baked lighting,TexPro is able to produce diverse texture maps via procedural material modeling,which enables physically-based rendering,relighting,and additional benefits inherent to procedural materials.Specifically,we first generate multi-view reference images given the input textual prompt by employing the latest text-to-image model.We then derive texture maps through rendering-based optimization with recent differentiable procedural materials.To this end,we design several techniques to handle the misalignment between the generated multi-view images and 3D meshes,and introduce a novel material agent that enhances material classification and matching by exploring both part-level understanding and object-aware material reasoning.Experiments demonstrate the superiority of the proposed method over existing SOTAs,and its capability of relighting.
基金supported by National Natural Science Foundation of China(No.6247075018 and No.62322210)the Innovation Funding of ICT,CAS(No.E461020)+1 种基金Beijing Munici-pal Natural Science Foundation for Distinguished Young Scholars(No.JQ21013)Beijing Municipal Science and Technology Commission(No.Z231100005923031).
文摘Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial details remains a persistent challenge.Moreover,casually captured input,involving both head poses and camera movements,introduces additional difficulties to existing methods of head avatar reconstruction.To address the challenge posed by video data captured with camera motion,we propose a novel method,AvatarWild,for reconstructing head avatars from monocular videos taken by consumer devices.Notably,our approach decouples the camera pose and head pose,allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints.To enhance the visual quality of the reconstructed facial avatar,we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency.Our method demonstrates superior performance compared to existing approaches,as evidenced by reconstruction and animation results on both multi-view and single-view datasets.Remarkably,our approach stands out by exclusively relying on video data captured by portable devices,such as smartphones.This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.