Reconstructing three-dimensional(3D)shapes from a single image remains a significant challenge in computer vision due to the inherent ambiguity caused by missing or occluded shape information.Previous studies have pre...Reconstructing three-dimensional(3D)shapes from a single image remains a significant challenge in computer vision due to the inherent ambiguity caused by missing or occluded shape information.Previous studies have predominantly focused on mesh models supervised by multi-view silhouettes.However,such methods are limited in reconstructing fine details.In this study,a 3D mesh model is predicted from a single image,leveraging depth consistency and without requiring viewpoint pose annotations.The model effectively learns strong shape priors that preserve finer structures and accurately predicts view poses from"correlation-supervised"viewpoints.Additionally,standard deviation and Laplacian losses were employed to regulate mesh edge distribution,resulting in more precise reconstructions.Differentiable renderer functions were derived from the 3D mesh to generate depth maps.Compared to conventional approaches,the proposed method provided superior representation of subtle structures.When applied to both synthetic and real-world datasets,the model outperformed existing methods in view-based 3D reconstruction tasks.展开更多
基金Supported by National Key Research and Development Program of China(Grant No.2024YFB3409800)Postdoctoral Fellowship Program of CPSF of China(Grant No.GZB20240940)China Postdoctoral Science Foundation(Grant Nos.2025T181108,2024M764127)。
文摘Reconstructing three-dimensional(3D)shapes from a single image remains a significant challenge in computer vision due to the inherent ambiguity caused by missing or occluded shape information.Previous studies have predominantly focused on mesh models supervised by multi-view silhouettes.However,such methods are limited in reconstructing fine details.In this study,a 3D mesh model is predicted from a single image,leveraging depth consistency and without requiring viewpoint pose annotations.The model effectively learns strong shape priors that preserve finer structures and accurately predicts view poses from"correlation-supervised"viewpoints.Additionally,standard deviation and Laplacian losses were employed to regulate mesh edge distribution,resulting in more precise reconstructions.Differentiable renderer functions were derived from the 3D mesh to generate depth maps.Compared to conventional approaches,the proposed method provided superior representation of subtle structures.When applied to both synthetic and real-world datasets,the model outperformed existing methods in view-based 3D reconstruction tasks.