We present an animatable 3D Gaussian representation for synthesizing high-fidelity human videos under novel views and poses in real time.Given multi-view videos of a human subject,we learn a collection of 3D Gaussians...We present an animatable 3D Gaussian representation for synthesizing high-fidelity human videos under novel views and poses in real time.Given multi-view videos of a human subject,we learn a collection of 3D Gaussians in the canonical space of the rest pose.Each Gaussian is associated with a few basic properties(i.e.,position,opacity,scale,rotation,spherical harmonics coefficients)representing the average human appearance across all video frames,as well as a latent code and a set of blend weights for dynamic appearance correction and pose transformation.The latent code is fed to an Multi-layer Perceptron(MLP)with a target pose to correct Gaussians in the canonical space to capture appearance changes under the target pose.The corrected Gaussians are then transformed to the target pose using linear blend skinning(LBS)with their blend weights.High-fidelity human images under novel views and poses can be rendered in real time through Gaussian splatting.Compared to state-of-the-art NeRF-based methods,our animatable Gaussian representation produces more compelling results with well captured details,and achieves superior rendering performance.展开更多
基金supported by the National Key R&D Program of China(No.2022YFF0902302)the National Natural Science Foundation of China(Grant Nos.62172357&62322209)。
文摘We present an animatable 3D Gaussian representation for synthesizing high-fidelity human videos under novel views and poses in real time.Given multi-view videos of a human subject,we learn a collection of 3D Gaussians in the canonical space of the rest pose.Each Gaussian is associated with a few basic properties(i.e.,position,opacity,scale,rotation,spherical harmonics coefficients)representing the average human appearance across all video frames,as well as a latent code and a set of blend weights for dynamic appearance correction and pose transformation.The latent code is fed to an Multi-layer Perceptron(MLP)with a target pose to correct Gaussians in the canonical space to capture appearance changes under the target pose.The corrected Gaussians are then transformed to the target pose using linear blend skinning(LBS)with their blend weights.High-fidelity human images under novel views and poses can be rendered in real time through Gaussian splatting.Compared to state-of-the-art NeRF-based methods,our animatable Gaussian representation produces more compelling results with well captured details,and achieves superior rendering performance.