Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composi...Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composite mask-based generative adversarial network(CMGAN)is introduced to predict camera motion and binocular depth maps.Specifically,a perceptual generator is constructed to obtain the corresponding parallax map and optical flow between two neighboring frames.Then,an iterative pose improvement strategy is proposed to improve the accuracy of pose estimation.Finally,a composite mask is embedded in the discriminator to sense structural deformation in the synthesized virtual image,thereby increasing the overall structural constraints of the network model,improving the accuracy of camera pose estimation,and reducing drift issues in the VO.Detailed quantitative and qualitative evaluations on the KITTI dataset show that the proposed framework outperforms existing conventional,supervised learning and unsupervised depth VO methods,providing better results in both pose estimation and depth estimation.展开更多
基金supported by the Program of Graduate Education and Teaching Reform in Tianjin University of Technology(Nos.YBXM2204 and ZDXM2202)the National Natural Science Foundation of China(Nos.62203331 and 62103299)。
文摘Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composite mask-based generative adversarial network(CMGAN)is introduced to predict camera motion and binocular depth maps.Specifically,a perceptual generator is constructed to obtain the corresponding parallax map and optical flow between two neighboring frames.Then,an iterative pose improvement strategy is proposed to improve the accuracy of pose estimation.Finally,a composite mask is embedded in the discriminator to sense structural deformation in the synthesized virtual image,thereby increasing the overall structural constraints of the network model,improving the accuracy of camera pose estimation,and reducing drift issues in the VO.Detailed quantitative and qualitative evaluations on the KITTI dataset show that the proposed framework outperforms existing conventional,supervised learning and unsupervised depth VO methods,providing better results in both pose estimation and depth estimation.