3D scene understanding and reconstruction aims to obtain a concise scene representation from images and reconstruct the complete scene,including the scene layout,objects bounding boxes and shapes.Existing holistic sce...3D scene understanding and reconstruction aims to obtain a concise scene representation from images and reconstruct the complete scene,including the scene layout,objects bounding boxes and shapes.Existing holistic scene understanding methods primarily recover scenes from single images,with a focus on indoor scenes.Due to the complexity of real-world,the information provided by a single image is limited,resulting in issues such as object occlusion and omission.Furthermore,captured data from outdoor scenes exhibits characteristics of sparsity,strong temporal dependencies and a lack of annotations.Consequently,the task of understanding and reconstructing outdoor scenes is highly challenging.The authors propose a sparse multi-view images-based 3D scene reconstruction framework(SMSR).It divides the scene reconstruction task into three stages:initial prediction,refinement,and fusion stage.The first two stages extract 3D scene representations from each viewpoint,while the final stage involves selection,calibration and fusion of object positions and orientations across different viewpoints.SMSR effectively address the issue of object omission by utilizing small-scale sequential scene information.Experimental results on the general outdoor scene dataset UrbanScene3D-Art Sci and our proprietary dataset Software College Aerial Time-series Images,demonstrate that SMSR achieves superior performance in the scene understanding and reconstruction.展开更多
Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast...Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast movement or severe jittering,and the efficiency need to be improved.The paper proposes an improved SLAM algorithm,which mainly improves the real-time performance of classical SLAM algorithm,applies KDtree for efficient organizing feature points,and accelerates the feature points correspondence building.Moreover,the background map reconstruction thread is optimized,the SLAM parallel computation ability is increased.The color images experiments demonstrate that the improved SLAM algorithm holds better realtime performance than the classical SLAM.展开更多
Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes o...Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.展开更多
This paper explores the key techniques and challenges in dynamic scene reconstruction with neural radiance fields(NeRF).As an emerging computer vision method,the NeRF has wide application potential,especially in excel...This paper explores the key techniques and challenges in dynamic scene reconstruction with neural radiance fields(NeRF).As an emerging computer vision method,the NeRF has wide application potential,especially in excelling at 3D reconstruction.We first introduce the basic principles and working mechanisms of NeRFs,followed by an in-depth discussion of the technical challenges faced by 3D reconstruction in dynamic scenes,including problems in perspective and illumination changes of moving objects,recognition and modeling of dynamic objects,real-time requirements,data acquisition and calibration,motion estimation,and evaluation mechanisms.We also summarize current state-of-theart approaches to address these challenges,as well as future research trends.The goal is to provide researchers with an in-depth understanding of the application of NeRFs in dynamic scene reconstruction,as well as insights into the key issues faced and future directions.展开更多
Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing de...Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information,traditional approaches often produce low-quality geometry with holes,bumps,and misalignments.We propose a novel 3D dynamic reconstruction system,named HDR-Net-Fusion,which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels,using a hierarchical deep reinforcement(HDR)network.The latter comprises two parts:a global HDR-Net which rapidly detects local regions with large geometric errors,and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions.Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality.The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset.Our method can reconstruct geometry with higher quality than traditional methods.展开更多
Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one ...Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one of the excellent open source algorithms in visual SLAM system,which is often used in indoor scene reconstruction.However,it is time-consuming and can only build sparse scene map by using ORB features to solve camera pose.In view of the shortcomings of ORB-SLAM2 method,this article proposes an improved ORB-SLAM2 solution,which uses a direct method based on light intensity to solve the camera pose.It can greatly reduce the amount of computation,the speed is significantly improved by about 5 times compared with the ORB feature method.A parallel thread of map reconstruction is added with surfel model,and depth map and RGB map are fused to build the dense map.A Realsense D415 sensor is used as RGB-D cameras to obtain the three-dimensional(3D)point clouds of an indoor environments.After calibration and alignment processing,the sensor is applied in the reconstruction experiment of indoor scene with the improved ORB-SLAM2 method.Results show that the improved ORB-SLAM2 algorithm cause a great improvement in processing speed and reconstructing density of scenes.展开更多
A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.Th...A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.The corresponding point coordinates form arithmetic progressions because integral imaging captures information with a senior array which has similar pitches on x and y directions.This regular relationship is used to determine the corresponding point parameters for reconstructing 3D information from divided elemental images separated by color,which contain several corresponding points.The feasibility of the proposed method is demonstrated through an optical indoor experiment.A large-scale application of the proposed method is illustrated by the experiment with a corner of our school as its object.展开更多
The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its p...The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.展开更多
As a frontier technology,holography has important research values in fields such as bio-micrographic imaging,light feld modulation and data storage.However,the real-time acquisition of 3D scenes and high-fidelity reco...As a frontier technology,holography has important research values in fields such as bio-micrographic imaging,light feld modulation and data storage.However,the real-time acquisition of 3D scenes and high-fidelity reconstruction technology has not yet made a breakthrough,which has seriously hindered the development of holography.Here,a novel holographic camera is proposed to solve the above inherent problems completely.The proposed holographic camera consists of the acquisition end and the calculation end.At the acquisition end of the holographic camera,specially configured liquid materials and liquid lens structure based on voice-coil motor-driving are used to produce the liquid camera,so that the liquid camera can quickly capture the focus stack of the real 3D scene within 15 ms.At the calculation end,a new structured focus stack network(FS-Net)is designed for hologram calculation.After training the FS-Net with the focus stack renderer and learnable Zernike phase,it enables hologram calculation within 13 ms.As the first device to achieve real-time incoherent acquisition and high-fidelity holographic reconstruction of a real 3D scene,our proposed holographic camera breaks technical bottlenecks of difficulty in acquiring the real 3D scene,low quality of the holographic reconstructed image,and incorrect defocus blur.The experimental results demonstrate the effectiveness of our holographic camera in the acquisition of focal plane information and hologram calculation of the real 3D scene.The proposed holographic camera opens up a new way for the application of holography in fields such as 3D display,light field modulation,and 3D measurement.展开更多
Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and obse...Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.展开更多
Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits...Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.展开更多
Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution dep...Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution depth maps obtained by various range sensors,including those in modern mobile phones,or by multi-view reconstruction algorithms.Our method leverages a data-driven prior in the form of a single image depth prediction network trained on large-scale datasets,the output of which is used as an input to our model.We propose an effective training scheme where we simulate various sparsity patterns in typical task domains.In addition,we design two new benchmarks to evaluate the generalizability and robustness of depth completion methods.Our simple method shows superior cross-domain generalization ability against state-of-the-art depth completion methods,introducing a practical solution to highqualitydepthcapture onamobile device.展开更多
Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost....Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost.The research in this field has been in the spotlight in the last few years as the metaverse went viral.The recently emerging omnidirectional video streams,i.e.,360°videos,provide an affordable way to capture and present dynamic real-world scenes.In the last decade,fueled by the rapid development of artificial intelligence and computational photography technologies,the research interests in mixed reality systems using 360°videos with richer and more realistic experiences are dramatically increased to unlock the true potential of the metaverse.In this survey,we cover recent research aimed at addressing the above issues in the 360°image and video processing technologies and applications for mixed reality.The survey summarizes the contributions of the recent research and describes potential future research directions about 360°media in the field of mixed reality.展开更多
A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spati...A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spatial domain and the frequency domain, the notch filter is designed with several parameters of PIISs, and the interferogram without a background is obtained. Both the simulated and the experimental results demonstrate that the background removal method is feasible and robust with a high processing speed. In addition, this method can reduce the noise level of the reconstructed spectrum, and it is insusceptible to a complicated background, compared with the polynomial fitting and empirical mode decomposition(EMD) methods.展开更多
基金National Key R&D Program of China,Grant/Award Number:2021YFC3300203TaiShan Scholars Program,Grant/Award Number:tsqn202211289+1 种基金Oversea Innovation Team Project of the“20 Regulations for New Universities”funding program of Jinan,Grant/Award Number:2021GXRC073Excellent Youth Scholars Program of Shandong Province,Grant/Award Number:2022HWYQ-048。
文摘3D scene understanding and reconstruction aims to obtain a concise scene representation from images and reconstruct the complete scene,including the scene layout,objects bounding boxes and shapes.Existing holistic scene understanding methods primarily recover scenes from single images,with a focus on indoor scenes.Due to the complexity of real-world,the information provided by a single image is limited,resulting in issues such as object occlusion and omission.Furthermore,captured data from outdoor scenes exhibits characteristics of sparsity,strong temporal dependencies and a lack of annotations.Consequently,the task of understanding and reconstructing outdoor scenes is highly challenging.The authors propose a sparse multi-view images-based 3D scene reconstruction framework(SMSR).It divides the scene reconstruction task into three stages:initial prediction,refinement,and fusion stage.The first two stages extract 3D scene representations from each viewpoint,while the final stage involves selection,calibration and fusion of object positions and orientations across different viewpoints.SMSR effectively address the issue of object omission by utilizing small-scale sequential scene information.Experimental results on the general outdoor scene dataset UrbanScene3D-Art Sci and our proprietary dataset Software College Aerial Time-series Images,demonstrate that SMSR achieves superior performance in the scene understanding and reconstruction.
基金This work is supported by the National Natural Science Foundation of China(Grant No.61672279)Project of“Six Talents Peak”in Jiangsu(2012-WLW-023)Open Foundation of State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering,Nanjing Hydraulic Research Institute,China(2016491411).
文摘Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast movement or severe jittering,and the efficiency need to be improved.The paper proposes an improved SLAM algorithm,which mainly improves the real-time performance of classical SLAM algorithm,applies KDtree for efficient organizing feature points,and accelerates the feature points correspondence building.Moreover,the background map reconstruction thread is optimized,the SLAM parallel computation ability is increased.The color images experiments demonstrate that the improved SLAM algorithm holds better realtime performance than the classical SLAM.
基金This work was supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.
基金supported by ZTE Industry-UniversityInstitute Cooperation Funds under Grant No.2023ZTE03-04.
文摘This paper explores the key techniques and challenges in dynamic scene reconstruction with neural radiance fields(NeRF).As an emerging computer vision method,the NeRF has wide application potential,especially in excelling at 3D reconstruction.We first introduce the basic principles and working mechanisms of NeRFs,followed by an in-depth discussion of the technical challenges faced by 3D reconstruction in dynamic scenes,including problems in perspective and illumination changes of moving objects,recognition and modeling of dynamic objects,real-time requirements,data acquisition and calibration,motion estimation,and evaluation mechanisms.We also summarize current state-of-theart approaches to address these challenges,as well as future research trends.The goal is to provide researchers with an in-depth understanding of the application of NeRFs in dynamic scene reconstruction,as well as insights into the key issues faced and future directions.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61902210 and 61521002).
文摘Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information,traditional approaches often produce low-quality geometry with holes,bumps,and misalignments.We propose a novel 3D dynamic reconstruction system,named HDR-Net-Fusion,which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels,using a hierarchical deep reinforcement(HDR)network.The latter comprises two parts:a global HDR-Net which rapidly detects local regions with large geometric errors,and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions.Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality.The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset.Our method can reconstruct geometry with higher quality than traditional methods.
基金This work was supported by Henan Province Science and Technology Project under Grant No.182102210065.
文摘Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one of the excellent open source algorithms in visual SLAM system,which is often used in indoor scene reconstruction.However,it is time-consuming and can only build sparse scene map by using ORB features to solve camera pose.In view of the shortcomings of ORB-SLAM2 method,this article proposes an improved ORB-SLAM2 solution,which uses a direct method based on light intensity to solve the camera pose.It can greatly reduce the amount of computation,the speed is significantly improved by about 5 times compared with the ORB feature method.A parallel thread of map reconstruction is added with surfel model,and depth map and RGB map are fused to build the dense map.A Realsense D415 sensor is used as RGB-D cameras to obtain the three-dimensional(3D)point clouds of an indoor environments.After calibration and alignment processing,the sensor is applied in the reconstruction experiment of indoor scene with the improved ORB-SLAM2 method.Results show that the improved ORB-SLAM2 algorithm cause a great improvement in processing speed and reconstructing density of scenes.
基金supported by the National Natural Science Foundation of China(No.11474169)
文摘A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.The corresponding point coordinates form arithmetic progressions because integral imaging captures information with a senior array which has similar pitches on x and y directions.This regular relationship is used to determine the corresponding point parameters for reconstructing 3D information from divided elemental images separated by color,which contain several corresponding points.The feasibility of the proposed method is demonstrated through an optical indoor experiment.A large-scale application of the proposed method is illustrated by the experiment with a corner of our school as its object.
文摘The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.
基金supported by National Natural Science Foundation of China(U22A2079,62275009,62175006,and U21B2034).
文摘As a frontier technology,holography has important research values in fields such as bio-micrographic imaging,light feld modulation and data storage.However,the real-time acquisition of 3D scenes and high-fidelity reconstruction technology has not yet made a breakthrough,which has seriously hindered the development of holography.Here,a novel holographic camera is proposed to solve the above inherent problems completely.The proposed holographic camera consists of the acquisition end and the calculation end.At the acquisition end of the holographic camera,specially configured liquid materials and liquid lens structure based on voice-coil motor-driving are used to produce the liquid camera,so that the liquid camera can quickly capture the focus stack of the real 3D scene within 15 ms.At the calculation end,a new structured focus stack network(FS-Net)is designed for hologram calculation.After training the FS-Net with the focus stack renderer and learnable Zernike phase,it enables hologram calculation within 13 ms.As the first device to achieve real-time incoherent acquisition and high-fidelity holographic reconstruction of a real 3D scene,our proposed holographic camera breaks technical bottlenecks of difficulty in acquiring the real 3D scene,low quality of the holographic reconstructed image,and incorrect defocus blur.The experimental results demonstrate the effectiveness of our holographic camera in the acquisition of focal plane information and hologram calculation of the real 3D scene.The proposed holographic camera opens up a new way for the application of holography in fields such as 3D display,light field modulation,and 3D measurement.
文摘Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.
文摘Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.
文摘Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution depth maps obtained by various range sensors,including those in modern mobile phones,or by multi-view reconstruction algorithms.Our method leverages a data-driven prior in the form of a single image depth prediction network trained on large-scale datasets,the output of which is used as an input to our model.We propose an effective training scheme where we simulate various sparsity patterns in typical task domains.In addition,we design two new benchmarks to evaluate the generalizability and robustness of depth completion methods.Our simple method shows superior cross-domain generalization ability against state-of-the-art depth completion methods,introducing a practical solution to highqualitydepthcapture onamobile device.
基金supported by the Marsden Fund Council managed by Royal Society of New Zealand under Grant Nos.MFP-20-VUW-180 and UOO1724Zhejiang Province Public Welfare Technology Application Research under Grant No.LGG22F020009the Key Lab of Film and TV Media Technology of Zhejiang Province of China under Grant No.2020E10015.
文摘Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost.The research in this field has been in the spotlight in the last few years as the metaverse went viral.The recently emerging omnidirectional video streams,i.e.,360°videos,provide an affordable way to capture and present dynamic real-world scenes.In the last decade,fueled by the rapid development of artificial intelligence and computational photography technologies,the research interests in mixed reality systems using 360°videos with richer and more realistic experiences are dramatically increased to unlock the true potential of the metaverse.In this survey,we cover recent research aimed at addressing the above issues in the 360°image and video processing technologies and applications for mixed reality.The survey summarizes the contributions of the recent research and describes potential future research directions about 360°media in the field of mixed reality.
基金supported by the Major Program of the National Natural Science Foundation of China(No.41530422)the National Science and Technology Major Project of the Ministry of Science and Technology of China(No.32-Y30B08-9001-13/15)+1 种基金the National Natural Science Foundation of China(Nos.61275184,61540018,61405153,and 60278019)the National High Technology Research and Development Program of China(No.2012AA121101)
文摘A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spatial domain and the frequency domain, the notch filter is designed with several parameters of PIISs, and the interferogram without a background is obtained. Both the simulated and the experimental results demonstrate that the background removal method is feasible and robust with a high processing speed. In addition, this method can reduce the noise level of the reconstructed spectrum, and it is insusceptible to a complicated background, compared with the polynomial fitting and empirical mode decomposition(EMD) methods.