The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. Wi...The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. With the method, two source images built by equal mapping and modulus mapping are individually decomposed into two Gauss-Laplacian pyramid sequences. Then, the two individual sequences are used to make a composite one according to the process of fusion. Finally, a new image is reconstructed from the composite sequence. Experimental results show that the new images integrate the advantages of sources, effectively improve the visualization, and disclose more information about explosive field.展开更多
视觉导航作为移动机器人自主运行的核心技术支撑,其性能直接决定移动机器人的环境感知精度、定位建图可靠性与路径规划的合理性。文章系统综述移动机器人视觉导航的研究进展,围绕视觉传感器、同步定位与地图构建(Simultaneous Localizat...视觉导航作为移动机器人自主运行的核心技术支撑,其性能直接决定移动机器人的环境感知精度、定位建图可靠性与路径规划的合理性。文章系统综述移动机器人视觉导航的研究进展,围绕视觉传感器、同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)和路径规划三大核心环节展开分析:在视觉传感器层面,重点探讨单模态、多模态融合视觉传感器和新型视觉传感器的技术特性与适配场景;在SLAM层面,总结传统几何SLAM、多模态融合SLAM以及神经隐式SLAM的技术演进与性能优势;在路径规划层面,重点介绍传统算法与生物启发算法的特点与适用场景。最后,总结当前技术面临的挑战,并对未来研究方向进行展望,为视觉导航技术的进一步发展提供参考。展开更多
针对传统开源的激光惯性里程计(LIO,lidar-inertial odometry)和即时定位与地图构建(SLAM,simultaneous localization and mapping)结合的LIO-SLAM在室内复杂环境中受激光特征稀疏与动态遮挡影响、定位精度下降等问题,提出一种融合视觉...针对传统开源的激光惯性里程计(LIO,lidar-inertial odometry)和即时定位与地图构建(SLAM,simultaneous localization and mapping)结合的LIO-SLAM在室内复杂环境中受激光特征稀疏与动态遮挡影响、定位精度下降等问题,提出一种融合视觉里程计的改进方法。在保持LIO-SLAM激光惯性紧耦合框架的基础上,引入基于ORB特征的三维定位与地图构建算法(ORB-SLAM)作为独立的视觉里程计模块,为系统提供高频率、丰富纹理的视觉约束信息。通过自适应权重融合策略,实现激光、惯性与视觉观测的多源优化,增强了在弱几何约束、纹理丰富但结构复杂环境中的鲁棒性。在多种典型室内场景(走廊、开放大厅及动态人群环境)中开展了实验验证。结果表明,相较于原始LIO-SLAM,整体轨迹误差降低至原始系统的70%。研究验证了视觉-激光-惯性多模态融合在室内复杂环境下的可行性与有效性,为高精度室内自主定位与地图构建提供了新的思路。展开更多
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi...Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.展开更多
基金Sponsored by the National Natural Science Foundation of China(10625208)the Basic Research Foundation of Beijing Institute of Technology(20061242005)the Foundation of State Key Laboratory of Explosion Science and Technology(ZDKT08-02)
文摘The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. With the method, two source images built by equal mapping and modulus mapping are individually decomposed into two Gauss-Laplacian pyramid sequences. Then, the two individual sequences are used to make a composite one according to the process of fusion. Finally, a new image is reconstructed from the composite sequence. Experimental results show that the new images integrate the advantages of sources, effectively improve the visualization, and disclose more information about explosive field.
文摘视觉导航作为移动机器人自主运行的核心技术支撑,其性能直接决定移动机器人的环境感知精度、定位建图可靠性与路径规划的合理性。文章系统综述移动机器人视觉导航的研究进展,围绕视觉传感器、同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)和路径规划三大核心环节展开分析:在视觉传感器层面,重点探讨单模态、多模态融合视觉传感器和新型视觉传感器的技术特性与适配场景;在SLAM层面,总结传统几何SLAM、多模态融合SLAM以及神经隐式SLAM的技术演进与性能优势;在路径规划层面,重点介绍传统算法与生物启发算法的特点与适用场景。最后,总结当前技术面临的挑战,并对未来研究方向进行展望,为视觉导航技术的进一步发展提供参考。
文摘针对传统开源的激光惯性里程计(LIO,lidar-inertial odometry)和即时定位与地图构建(SLAM,simultaneous localization and mapping)结合的LIO-SLAM在室内复杂环境中受激光特征稀疏与动态遮挡影响、定位精度下降等问题,提出一种融合视觉里程计的改进方法。在保持LIO-SLAM激光惯性紧耦合框架的基础上,引入基于ORB特征的三维定位与地图构建算法(ORB-SLAM)作为独立的视觉里程计模块,为系统提供高频率、丰富纹理的视觉约束信息。通过自适应权重融合策略,实现激光、惯性与视觉观测的多源优化,增强了在弱几何约束、纹理丰富但结构复杂环境中的鲁棒性。在多种典型室内场景(走廊、开放大厅及动态人群环境)中开展了实验验证。结果表明,相较于原始LIO-SLAM,整体轨迹误差降低至原始系统的70%。研究验证了视觉-激光-惯性多模态融合在室内复杂环境下的可行性与有效性,为高精度室内自主定位与地图构建提供了新的思路。
基金supported by the National Key R&D Program of China(No.2020AAA0108904)the Science and Technology Plan of Shenzhen(No.JCYJ20200109140410340).
文摘Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.