Reconstructing from multi-view images is a longstanding problem in 3D vision,where neural radiance fields(NeRFs)have shown great potential and get realistic rendered images of novel views.Currently,most NeRF methods e...Reconstructing from multi-view images is a longstanding problem in 3D vision,where neural radiance fields(NeRFs)have shown great potential and get realistic rendered images of novel views.Currently,most NeRF methods either require accurate camera poses or a large number of input images,or even both.Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed.To address this problem,we propose CAD-NeRF,a method reconstructed from less than 10 images without any known poses.Specifically,we build a mini library of several CAD models from ShapeNet and render them from many random views.Given sparse-view input images,we run a model and pose retrieval from the library,to get a model with similar shapes,serving as the density supervision and pose initializations.Here we propose a multi-view pose retrieval method to avoid pose conflicts among views,which is a new and unseen problem in uncalibrated NeRF methods.Then,the geometry of the object is trained by the CAD guidance.The deformation of the density field and camera poses are optimized jointly.Then texture and density are trained and fine-tuned as well.All training phases are in self-supervised manners.Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models,showing the generalization abilities.展开更多
同步定位与建图(simultaneous localization and mapping,SLAM)是指在未知环境中同时实现自主移动机器人的定位和环境地图构建,其在机器人技术和自动驾驶等领域有着重要价值。本文首先回顾SLAM技术的发展历程,从早期的手工特征提取方法...同步定位与建图(simultaneous localization and mapping,SLAM)是指在未知环境中同时实现自主移动机器人的定位和环境地图构建,其在机器人技术和自动驾驶等领域有着重要价值。本文首先回顾SLAM技术的发展历程,从早期的手工特征提取方法到现代的深度学习驱动的解决方案。其中,基于神经辐射场(neural radiance fields,NeRF)的SLAM方法利用神经网络进行场景表征,进一步提高了建图的可视化效果。然而,这类方法在渲染速度上仍然面临挑战,限制了其实时应用的可能性。相比之下,基于高斯溅射(Gaussian splatting,GS)的SLAM方法以其实时的渲染速度和照片级的场景渲染效果,为SLAM领域带来新的研究热点和机遇。接着,按照RGB/RGBD、多模态数据以及语义信息3种不同应用类型对基于高斯溅射的SLAM方法进行分类和总结,并针对每种情况讨论相应SLAM方法的优势和局限性。最后,针对当前基于高斯溅射的SLAM方法面临的实时性、基准一致化、大场景的扩展性以及灾难性遗忘等问题进行分析,并对未来研究方向进行展望。通过这些探讨和分析,旨在为SLAM领域的研究人员和工程师提供全面的视角和启发,帮助分析和理解当前SLAM系统面临的关键问题,推动该领域的技术进步和应用拓展。展开更多
This article proposes a three-dimensional light field reconstruction method based on neural radiation field(NeRF)called Infrared NeRF for low resolution thermal infrared scenes.Based on the characteristics of the low ...This article proposes a three-dimensional light field reconstruction method based on neural radiation field(NeRF)called Infrared NeRF for low resolution thermal infrared scenes.Based on the characteristics of the low resolution thermal infrared imaging,various optimizations have been carried out to improve the speed and accuracy of thermal infrared 3D reconstruction.Firstly,inspired by Boltzmann's law of thermal radiation,distance is incorporated into the NeRF model for the first time,resulting in a nonlinear propagation of a single ray and a more accurate description of the physical property that infrared radiation intensity decreases with increasing distance.Secondly,in terms of improving inference speed,based on the phenomenon of high and low frequency distribution of foreground and background in infrared images,a multi ray non-uniform light synthesis strategy is proposed to make the model pay more attention to foreground objects in the scene,reduce the distribution of light in the background,and significantly reduce training time without reducing accuracy.In addition,compared to visible light scenes,infrared images only have a single channel,so fewer network parameters are required.Experiments using the same training data and data filtering method showed that,compared to the original NeRF,the improved network achieved an average improvement of 13.8%and 4.62%in PSNR and SSIM,respectively,while an average decreases of 46%in LPIPS.And thanks to the optimization of network layers and data filtering methods,training only takes about 25%of the original method's time to achieve convergence.Finally,for scenes with weak backgrounds,this article improves the inference speed of the model by 4-6 times compared to the original NeRF by limiting the query interval of the model.展开更多
基金supported in part by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62325221, 62132021, and 62372457), the Young Elite Scientists Sponsorship Program by CAST, China (Grant No. 2023QNRC001)the Natural Science Foundation of Hunan Province, China (Grant Nos. 2021RC3071 and 2022RC1104)the National University of Defense Technology Research Grants, China (Grant No. ZK22-52).
文摘Reconstructing from multi-view images is a longstanding problem in 3D vision,where neural radiance fields(NeRFs)have shown great potential and get realistic rendered images of novel views.Currently,most NeRF methods either require accurate camera poses or a large number of input images,or even both.Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed.To address this problem,we propose CAD-NeRF,a method reconstructed from less than 10 images without any known poses.Specifically,we build a mini library of several CAD models from ShapeNet and render them from many random views.Given sparse-view input images,we run a model and pose retrieval from the library,to get a model with similar shapes,serving as the density supervision and pose initializations.Here we propose a multi-view pose retrieval method to avoid pose conflicts among views,which is a new and unseen problem in uncalibrated NeRF methods.Then,the geometry of the object is trained by the CAD guidance.The deformation of the density field and camera poses are optimized jointly.Then texture and density are trained and fine-tuned as well.All training phases are in self-supervised manners.Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models,showing the generalization abilities.
文摘目的 基于点云的神经渲染方法受点云质量及特征提取的影响,易导致新视角合成图像渲染质量下降,为此提出一种融合局部空间信息的新视角合成方法。方法 针对点云质量及提取特征不足的问题,首先,设计一种神经点云特征对齐模块,将点云与图像匹配区域的特征进行对齐,融合后构成神经点云,提升其特征的局部表达能力;其次,提出一种神经点云Transformer模块,用于融合局部神经点云的上下文信息,在点云质量不佳的情况下仍能提取可靠的局部空间信息,有效增强了点云神经渲染方法的合成质量。结果 实验结果表明,在真实场景数据集中,对于只包含单一物品的数据集Tanks and Temples,本文方法在峰值信噪比(peak signal to noise ratio,PSNR)指标上与NeRF(neural radiance field)方法相比提升19.2%,相较于使用点云输入的方法 Tetra-NeRF和Point-NeRF分别提升了6.4%和3.8%,即使在场景更为复杂的ScanNet数据集中,与NeRF方法及Point-NeRF相比分别提升了34.6%和2.1%。结论 本文方法能够更好地利用点云的局部空间信息,有效改善了稀疏视角图像输入下因点云质量和提取特征导致的渲染质量下降,实验结果验证了本文方法的有效性。
基金Support by the Fundamental Research Funds for the Central Universities(2024300443)the National Natural Science Foundation of China(NSFC)Young Scientists Fund(62405131)。
文摘This article proposes a three-dimensional light field reconstruction method based on neural radiation field(NeRF)called Infrared NeRF for low resolution thermal infrared scenes.Based on the characteristics of the low resolution thermal infrared imaging,various optimizations have been carried out to improve the speed and accuracy of thermal infrared 3D reconstruction.Firstly,inspired by Boltzmann's law of thermal radiation,distance is incorporated into the NeRF model for the first time,resulting in a nonlinear propagation of a single ray and a more accurate description of the physical property that infrared radiation intensity decreases with increasing distance.Secondly,in terms of improving inference speed,based on the phenomenon of high and low frequency distribution of foreground and background in infrared images,a multi ray non-uniform light synthesis strategy is proposed to make the model pay more attention to foreground objects in the scene,reduce the distribution of light in the background,and significantly reduce training time without reducing accuracy.In addition,compared to visible light scenes,infrared images only have a single channel,so fewer network parameters are required.Experiments using the same training data and data filtering method showed that,compared to the original NeRF,the improved network achieved an average improvement of 13.8%and 4.62%in PSNR and SSIM,respectively,while an average decreases of 46%in LPIPS.And thanks to the optimization of network layers and data filtering methods,training only takes about 25%of the original method's time to achieve convergence.Finally,for scenes with weak backgrounds,this article improves the inference speed of the model by 4-6 times compared to the original NeRF by limiting the query interval of the model.