This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two su...This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two sub-images on the left and right and these sub-images are generated by two virtual cameras which are produced by the bi-prism. This stereovision system is equivalent to the conventional two camera system and the two sub-images captured have disparities which can be used to reconstruct back the 3-dimensional (3D) scene. The stereo correspondence problem of this system will be solved geometrically by applying the epipolar geometry constraint on the generated virtual cameras instead of the real CCD camera. Experiments are conducted to validate the proposed method and the results are compared to the calibration based approach to confirm its accuracy and effectiveness.展开更多
This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images ...This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images use matching features to estimate the essential matrix. The essential matrix is then decomposed into the relative rotation and normalized translation between frames. To be robust to noise and feature match outliers, these methods generate a large number of essential matrix hypotheses from randomly selected minimal subsets of feature pairs, and then score these hypotheses on all feature pairs. Alternatively, the algorithm introduced in this paper calculates relative pose hypotheses by directly optimizing the rotation and normalized translation between frames, rather than calculating the essential matrix and then performing the decomposition. The resulting algorithm improves computation time by an order of magnitude. If an inertial measurement unit(IMU) is available, it is used to seed the optimizer, and in addition, we reuse the best hypothesis at each iteration to seed the optimizer thereby reducing the number of relative pose hypotheses that must be generated and scored. These advantages greatly speed up performance and enable the algorithm to run in real-time on low cost embedded hardware. We show application of our algorithm to visual multi-target tracking(MTT) in the presence of parallax and demonstrate its real-time performance on a 640 × 480 video sequence captured on a UAV. Video results are available at https://youtu.be/Hh K-p2 h XNn U.展开更多
This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function ...This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function as a measurment, the points which bring larger noise are deleted, and the points with smaller noise are retained, thus the precision of our method is increased. The experiment results indicate the new method is precise in calculation, stable in performance and resistant to noise.展开更多
Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not...Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.展开更多
The task of detecting three-dimensional objects using only RGB images presents a considerable challenge within the domain of computer vision.The core issue lies in accurately performing epipolar geometry matching betw...The task of detecting three-dimensional objects using only RGB images presents a considerable challenge within the domain of computer vision.The core issue lies in accurately performing epipolar geometry matching between multiple views to obtain latent geometric priors.Existing methods establish correspondences along epipolar line features in voxel space through various layers of convolution.However,this step often occurs in the later stages of the network,which limits overall performance.To address this challenge,we introduce a novel framework,ImVoxelENet,that integrates a geometric epipolar constraint.We start from the back-projection of pixel-wise features and design an attention mechanism that captures the relationship between forward and backward features along the ray for multiple views.This approach enables the early establishment of geometric correspondences and structural connections between epipolar lines.Using ScanNetV2 as a benchmark,extensive comparative and ablation experiments demonstrate that our proposed network achieves a 1.1%improvement in mAP,highlighting its effectiveness in enhancing 3D object detection performance.Our code is available at https://github.com/xug-coder/ImVoxelENet.展开更多
Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,su...Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.展开更多
视觉同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)系统常受到动态物体的影响,导致定位精度和健壮性下降。为了解决这一问题,将轻量化网络架构MobileNetV3引入系统设计,对YOLOv5s的主干网络进行重构,在保证检测精度...视觉同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)系统常受到动态物体的影响,导致定位精度和健壮性下降。为了解决这一问题,将轻量化网络架构MobileNetV3引入系统设计,对YOLOv5s的主干网络进行重构,在保证检测精度的同时,实现模型参数规模缩减67.9%,单帧运行速度提升89%,能够满足实际应用中的高效运行需求。针对潜在动态特征干扰与漏检问题,引入光流运动估计和对极几何约束,通过运动一致性校验筛选静态特征点集,确保位姿解算的几何可靠性,并构建环境的稠密点云地图。实验量化评估结果显示,在动态目标占比高的测试场景下,改进算法定位误差较ORB-SLAM3降低90.2%,有效提升了位姿估计的精度和健壮性。展开更多
文摘This paper proposes a simple geometrical ray based approach to solve the stereo correspondence problem for the single-lens bi-prism stereovision system. Each image captured using this system can be divided into two sub-images on the left and right and these sub-images are generated by two virtual cameras which are produced by the bi-prism. This stereovision system is equivalent to the conventional two camera system and the two sub-images captured have disparities which can be used to reconstruct back the 3-dimensional (3D) scene. The stereo correspondence problem of this system will be solved geometrically by applying the epipolar geometry constraint on the generated virtual cameras instead of the real CCD camera. Experiments are conducted to validate the proposed method and the results are compared to the calibration based approach to confirm its accuracy and effectiveness.
基金funded by the Center for Unmanned Aircraft Systems(C-UAS)a National Science Foundation Industry/University Cooperative Research Center(I/UCRC)under NSF award Numbers IIP-1161036 and CNS-1650547along with significant contributions from C-UAS industry members。
文摘This paper introduces a new algorithm for estimating the relative pose of a moving camera using consecutive frames of a video sequence. State-of-the-art algorithms for calculating the relative pose between two images use matching features to estimate the essential matrix. The essential matrix is then decomposed into the relative rotation and normalized translation between frames. To be robust to noise and feature match outliers, these methods generate a large number of essential matrix hypotheses from randomly selected minimal subsets of feature pairs, and then score these hypotheses on all feature pairs. Alternatively, the algorithm introduced in this paper calculates relative pose hypotheses by directly optimizing the rotation and normalized translation between frames, rather than calculating the essential matrix and then performing the decomposition. The resulting algorithm improves computation time by an order of magnitude. If an inertial measurement unit(IMU) is available, it is used to seed the optimizer, and in addition, we reuse the best hypothesis at each iteration to seed the optimizer thereby reducing the number of relative pose hypotheses that must be generated and scored. These advantages greatly speed up performance and enable the algorithm to run in real-time on low cost embedded hardware. We show application of our algorithm to visual multi-target tracking(MTT) in the presence of parallax and demonstrate its real-time performance on a 640 × 480 video sequence captured on a UAV. Video results are available at https://youtu.be/Hh K-p2 h XNn U.
基金Supported by the National Science Foundation(69275004)the France-China Advanced Research Program
文摘This paper combines the least-square method and iteration method to get the fundamental matrix and develops a new evaluation function based on the epipolar geometry. During the iteration, with the evaluation function as a measurment, the points which bring larger noise are deleted, and the points with smaller noise are retained, thus the precision of our method is increased. The experiment results indicate the new method is precise in calculation, stable in performance and resistant to noise.
基金This work was supported by Sichuan Science and Technology Program(2023YFG0262).
文摘Transformer-based stereo image super-resolution reconstruction(Stereo SR)methods have significantly improved image quality.However,existing methods have deficiencies in paying attention to detailed features and do not consider the offset of pixels along the epipolar lines in complementary views when integrating stereo information.To address these challenges,this paper introduces a novel epipolar line window attention stereo image super-resolution network(EWASSR).For detail feature restoration,we design a feature extractor based on Transformer and convolutional neural network(CNN),which consists of(shifted)window-based self-attention((S)W-MSA)and feature distillation and enhancement blocks(FDEB).This combination effectively solves the problem of global image perception and local feature attention and captures more discriminative high-frequency features of the image.Furthermore,to address the problem of offset of complementary pixels in stereo images,we propose an epipolar line window attention(EWA)mechanism,which divides windows along the epipolar direction to promote efficient matching of shifted pixels,even in pixel smooth areas.More accurate pixel matching can be achieved using adjacent pixels in the window as a reference.Extensive experiments demonstrate that our EWASSR can reconstruct more realistic detailed features.Comparative quantitative results show that in the experimental results of our EWASSR on the Middlebury and Flickr1024 data sets for 2×SR,compared with the recent network,the Peak signal-to-noise ratio(PSNR)increased by 0.37 dB and 0.34 dB,respectively.
文摘The task of detecting three-dimensional objects using only RGB images presents a considerable challenge within the domain of computer vision.The core issue lies in accurately performing epipolar geometry matching between multiple views to obtain latent geometric priors.Existing methods establish correspondences along epipolar line features in voxel space through various layers of convolution.However,this step often occurs in the later stages of the network,which limits overall performance.To address this challenge,we introduce a novel framework,ImVoxelENet,that integrates a geometric epipolar constraint.We start from the back-projection of pixel-wise features and design an attention mechanism that captures the relationship between forward and backward features along the ray for multiple views.This approach enables the early establishment of geometric correspondences and structural connections between epipolar lines.Using ScanNetV2 as a benchmark,extensive comparative and ablation experiments demonstrate that our proposed network achieves a 1.1%improvement in mAP,highlighting its effectiveness in enhancing 3D object detection performance.Our code is available at https://github.com/xug-coder/ImVoxelENet.
基金supported by the National Natural Science Foundation of China(No.62271410)the Fundamental Research Funds for the Central Universities.
文摘Multi-View Stereo(MVS)is a pivotal technique in computer vision for reconstructing 3D models from multiple images by estimating depth maps.However,the reconstruction performance is hindered by visibility challenges,such as occlusions and non-overlapping regions.In this paper,we propose an innovative visibility-aware framework to address these issues.Central to our method is an Epipolar Line-based Transformer(ELT)module,which capitalizes on the epipolar line correspondence and candidate matching features between images to enhance the feature representation and correlation robustness.Furthermore,we propose a novel Supervised Visibility Estimation(SVE)module that estimates high-precision visibility maps,transcending the constraints of previous methods that rely on indirect supervision.By integrating these modules,our method achieves state-of-the-art results on the benchmarks and demonstrates its capability to perform high-quality reconstructions even in challenging regions.The code will be released at https://github.com/npucvr/ETV-MVS.
文摘视觉同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)系统常受到动态物体的影响,导致定位精度和健壮性下降。为了解决这一问题,将轻量化网络架构MobileNetV3引入系统设计,对YOLOv5s的主干网络进行重构,在保证检测精度的同时,实现模型参数规模缩减67.9%,单帧运行速度提升89%,能够满足实际应用中的高效运行需求。针对潜在动态特征干扰与漏检问题,引入光流运动估计和对极几何约束,通过运动一致性校验筛选静态特征点集,确保位姿解算的几何可靠性,并构建环境的稠密点云地图。实验量化评估结果显示,在动态目标占比高的测试场景下,改进算法定位误差较ORB-SLAM3降低90.2%,有效提升了位姿估计的精度和健壮性。