摘要
针对有监督学习的视觉里程计(VO)需要繁重的真实位姿标签标注过程、VO泛化能力不足导致定位轨迹漂移大的问题,提出一种基于编码器-解码器架构的自监督单目VO网络模型。通过编码器MPVi T对图像特征进行多层次多尺度嵌入,结合解码器U-Net对低维与高维特征的逐级融合,实现了对表征平移和旋转的六自由度位姿的“端到端”学习;作为与位姿相关的几何约束,位姿变换的传递性约束与可逆性约束被集成至损失函数,有利于在局部范围内抑制VO定位的轨迹漂移。在KITTI基准数据集及自采集室外导航视频序列上的实验表明:所提VO网络模型在KITTI的9个序列中表现最优,绝对轨迹误差较次优方法DPVO平均减小25.80%,且在现实场景中能够应对环境特征稀疏性、机器人高速运动及剧烈光照变化,具有更好的鲁棒性与泛化性能。
Addressing problems like the cumbersome process of labeling real poses required for supervised learning-based visual odometry(VO)and the significant trajectory drift in positioning caused by limited generalization ability of VO,an encoder-decoder architecture-based,self-supervised monocular VO network model is proposed.By leveraging the multi-level,multi-scale embedding of image features by the encoder‘MPViT’,as well as the progressive fusion of low-dimensional and high-dimensional features by the decoder‘U-Net’,this VO achieves‘end-to-end’learning of 6-degree-of-freedom poses that represent both translation and rotation.As geometric constraints related to the pose,the transitivity and reversibility constraints of pose transformation are integrated into the loss function.They contribute to dampen trajectory drift in local areas during VO positioning.Experiments on the KITTI benchmark dataset and self-recorded outdoor navigation video sequences show that the proposed VO network model performs optimally across 9 sequences of KITTI dataset.The absolute trajectory error is reduced by an average of 25.80%compared with the suboptimal method DPVO.And in real-world scenes,it is capable of addressing environmental feature sparsity,high-speed robot motion and severe illumination variations,exhibiting better robustness and generalization abilities.
作者
夏琳琳
张尊正
刘岘林
王凯
阮恒
XIA Linlin;ZHANG Zunzheng;LIU Xianlin;WANG Kai;RUAN Heng(School of Automation Engineering,Northeast Electric Power University,Jilin 132012,China)
出处
《中国惯性技术学报》
北大核心
2025年第8期761-769,共9页
Journal of Chinese Inertial Technology
基金
吉林省发改委产业技术研究与开发项目(2024C007-3)。