期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Rapid and robust initialization for monocular visual inertial navigation within multi-state Kalman filter 被引量:10
1
作者 Wei FANG Lianyu ZHENG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2018年第1期148-160,共13页
Sensor-fusion based navigation attracts significant attentions for its robustness and accuracy in various applications. To achieve a versatile and efficient state estimation both indoor and outdoor, this paper present... Sensor-fusion based navigation attracts significant attentions for its robustness and accuracy in various applications. To achieve a versatile and efficient state estimation both indoor and outdoor, this paper presents an improved monocular visual inertial navigation architecture within the Multi-State Constraint Kalman Filter (MSCKF). In addition, to alleviate the initialization demands by appending enough stable poses in MSCKF, a rapid and robust Initialization MSCKF (I-MSCKF) navigation method is proposed in the paper. Based on the trifocal tensor and sigmapoint filter, the initialization of the integrated navigation can be accomplished within three consecutive visual frames. Thus, the proposed I-MSCKF method can improve the navigation performance when suffered from shocks at the initial stage. Moreover, the sigma-point filter is applied at initial stage to improve the accuracy for state estimation. The state vector generated at initial stage from the proposed method is consistent with MSCKF, and thus a seamless transition can be achieved between the initialization and the subsequent navigation in I-MSCKF. Finally, the experimental results show that the proposed I-MSCKF method can improve the robustness and accuracy for monocular visual inertial navigations. 展开更多
关键词 Estimator initialization NAVIGATION Kalman filter Pose estimation Visual inertial fusion
原文传递
On‐device audio‐visual multi‐person wake word spotting
2
作者 Yidi Li Guoquan Wang +2 位作者 Zhan Chen Hao Tang Hong Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1578-1589,共12页
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi... Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods. 展开更多
关键词 audio‐visual fusion human‐computer interfacing speech processing
在线阅读 下载PDF
Robust Local Light Field Synthesis via Occlusion-aware Sampling and Deep Visual Feature Fusion
3
作者 Wenpeng Xing Jie Chen Yike Guo 《Machine Intelligence Research》 EI CSCD 2023年第3期408-420,共13页
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseli... Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods. 展开更多
关键词 Novel view synthesis light field(LF)imaging multi-view stereo occlusion sampling deep visual feature(DVF)fusion
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部