The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. Wi...The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. With the method, two source images built by equal mapping and modulus mapping are individually decomposed into two Gauss-Laplacian pyramid sequences. Then, the two individual sequences are used to make a composite one according to the process of fusion. Finally, a new image is reconstructed from the composite sequence. Experimental results show that the new images integrate the advantages of sources, effectively improve the visualization, and disclose more information about explosive field.展开更多
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi...Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.展开更多
Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A mod...Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.展开更多
背景:腰椎滑脱症的治疗策略呈现多元化发展,但现有研究多聚焦单一技术或短期疗效,缺乏对全球研究趋势及核心热点的系统性整合。目的:通过文献计量学与可视化分析工具解析腰椎滑脱症治疗领域的全球研究现状、知识结构、核心热点及未来方...背景:腰椎滑脱症的治疗策略呈现多元化发展,但现有研究多聚焦单一技术或短期疗效,缺乏对全球研究趋势及核心热点的系统性整合。目的:通过文献计量学与可视化分析工具解析腰椎滑脱症治疗领域的全球研究现状、知识结构、核心热点及未来方向。方法:检索Web of Science核心数据库中2010-2025年腰椎滑脱症治疗相关文献,利用CiteSpace、VOSviewer和Excel进行多维度分析,包括年度发文量、国家贡献、机构合作、作者影响力、期刊分布、参考文献共被引及关键词共现与突现分析。结果与结论:共纳入367篇腰椎滑脱症治疗相关文献,2010-2024年发文量总体呈上升趋势,年均发文26.5篇,2025年初已有5篇新研究;中国以130篇发文量居首,但美国以3072次总被引和H指数32领跑学术影响力主导国际合作网络;加州大学旧金山分校和梅奥诊所为核心研究机构;期刊《World Neurosurgery》拥有最高的发文量与被引用率,期刊《Journal of Neurosurgery-Spine》被引频次最高;美国学者Mummaneni,pv(H指数60)和中国学者Tian Wei(7篇)为高产作者代表;高频关键词包括“脊椎滑脱(128次)”“外科手术(104次)”“融合术(75次)”和“微创手术”,突现词显示研究向微创化、并发症管理及多学科交叉方向迁移。展开更多
Sensor-fusion based navigation attracts significant attentions for its robustness and accuracy in various applications. To achieve a versatile and efficient state estimation both indoor and outdoor, this paper present...Sensor-fusion based navigation attracts significant attentions for its robustness and accuracy in various applications. To achieve a versatile and efficient state estimation both indoor and outdoor, this paper presents an improved monocular visual inertial navigation architecture within the Multi-State Constraint Kalman Filter (MSCKF). In addition, to alleviate the initialization demands by appending enough stable poses in MSCKF, a rapid and robust Initialization MSCKF (I-MSCKF) navigation method is proposed in the paper. Based on the trifocal tensor and sigmapoint filter, the initialization of the integrated navigation can be accomplished within three consecutive visual frames. Thus, the proposed I-MSCKF method can improve the navigation performance when suffered from shocks at the initial stage. Moreover, the sigma-point filter is applied at initial stage to improve the accuracy for state estimation. The state vector generated at initial stage from the proposed method is consistent with MSCKF, and thus a seamless transition can be achieved between the initialization and the subsequent navigation in I-MSCKF. Finally, the experimental results show that the proposed I-MSCKF method can improve the robustness and accuracy for monocular visual inertial navigations.展开更多
In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Fi...In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation.展开更多
Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is ...Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.展开更多
It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper pro...It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.展开更多
Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over...Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over the past decades. Since most images or videos over the Internet are stored in compressed domains such as images in JPEG format and videos in MPEG2 format, H.264 format, and MPEG4 Visual format, many saliency detection models have been proposed in the compressed domain recently. We provide a review of our works on saliency detection models in the compressed domain in this paper.Besides, we introduce some commonly used fusion strategies to combine spatial saliency map and temporal saliency map to compute the final video saliency map.展开更多
为提升自动驾驶车辆在多车道行驶与作业时的道路环境感知能力,提出了自动驾驶环境下车道级雷视融合方法 LLV-SLAM(lane-level LiDAR-visual fusion SLAM),并构建了适用于雷视融合的实时定位与建图算法(simultaneous localization and ma...为提升自动驾驶车辆在多车道行驶与作业时的道路环境感知能力,提出了自动驾驶环境下车道级雷视融合方法 LLV-SLAM(lane-level LiDAR-visual fusion SLAM),并构建了适用于雷视融合的实时定位与建图算法(simultaneous localization and mapping,SLAM)。首先,在视觉特征点提取的基础上引入直方图均衡化,并利用激光雷达获取特征点深度信息,通过视觉特征跟踪以提升SLAM系统鲁棒性。其次,利用视觉关键帧信息对激光点云进行运动畸变校正,并将LeGO-LOAM(lightweight and groud-optimized lidar odometry and mapping)融入视觉ORBSLAM2(oriented FAST and rotated BRIEF SLAM2)以增强闭环检测与矫正性能,降低系统累计误差。最后,将视觉图像所获取的位姿进行坐标转换作为激光里程计的位姿初值,辅助激光雷达SLAM进行三维场景重建。实验结果表明:相比于传统的SLAM方法,融合后的LLV-SLAM方法平均定位时延减少了41.61%;在x、y、z方向上的平均定位误差分别减少了34.63%、38.16%、24.09%;在滚转角、俯仰角、偏航角方向上的平均旋转误差减少了40.8%、37.52%、39.5%。LLV-SLAM算法有效抑制了LeGO-LOAM算法的尺度漂移,实时性和鲁棒性有显著提升,能够满足自动驾驶车辆对多车道道路环境的感知需要。展开更多
基金Sponsored by the National Natural Science Foundation of China(10625208)the Basic Research Foundation of Beijing Institute of Technology(20061242005)the Foundation of State Key Laboratory of Explosion Science and Technology(ZDKT08-02)
文摘The advantages and disadvantages of two existing methods for explosive field visualization are analyzed in this paper. And a new method based on image fusion is proposed to integrate their complementary advantages. With the method, two source images built by equal mapping and modulus mapping are individually decomposed into two Gauss-Laplacian pyramid sequences. Then, the two individual sequences are used to make a composite one according to the process of fusion. Finally, a new image is reconstructed from the composite sequence. Experimental results show that the new images integrate the advantages of sources, effectively improve the visualization, and disclose more information about explosive field.
基金supported by the National Key R&D Program of China(No.2020AAA0108904)the Science and Technology Plan of Shenzhen(No.JCYJ20200109140410340).
文摘Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
基金Supported by the National Natural Science Foundation of China(60905006)the NSFC-Guangdong Joint Fund(U1035004)
文摘Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.
文摘背景:腰椎滑脱症的治疗策略呈现多元化发展,但现有研究多聚焦单一技术或短期疗效,缺乏对全球研究趋势及核心热点的系统性整合。目的:通过文献计量学与可视化分析工具解析腰椎滑脱症治疗领域的全球研究现状、知识结构、核心热点及未来方向。方法:检索Web of Science核心数据库中2010-2025年腰椎滑脱症治疗相关文献,利用CiteSpace、VOSviewer和Excel进行多维度分析,包括年度发文量、国家贡献、机构合作、作者影响力、期刊分布、参考文献共被引及关键词共现与突现分析。结果与结论:共纳入367篇腰椎滑脱症治疗相关文献,2010-2024年发文量总体呈上升趋势,年均发文26.5篇,2025年初已有5篇新研究;中国以130篇发文量居首,但美国以3072次总被引和H指数32领跑学术影响力主导国际合作网络;加州大学旧金山分校和梅奥诊所为核心研究机构;期刊《World Neurosurgery》拥有最高的发文量与被引用率,期刊《Journal of Neurosurgery-Spine》被引频次最高;美国学者Mummaneni,pv(H指数60)和中国学者Tian Wei(7篇)为高产作者代表;高频关键词包括“脊椎滑脱(128次)”“外科手术(104次)”“融合术(75次)”和“微创手术”,突现词显示研究向微创化、并发症管理及多学科交叉方向迁移。
基金the supports of the Beijing Key Laboratory of Digital Design&Manufacturethe Academic Excellence Foundation of Beihang University for Ph.D.Studentsthe MIIT(Ministry of Industry and Information Technology)Key Laboratory of Smart Manufacturing for High-end Aerospace Products
文摘Sensor-fusion based navigation attracts significant attentions for its robustness and accuracy in various applications. To achieve a versatile and efficient state estimation both indoor and outdoor, this paper presents an improved monocular visual inertial navigation architecture within the Multi-State Constraint Kalman Filter (MSCKF). In addition, to alleviate the initialization demands by appending enough stable poses in MSCKF, a rapid and robust Initialization MSCKF (I-MSCKF) navigation method is proposed in the paper. Based on the trifocal tensor and sigmapoint filter, the initialization of the integrated navigation can be accomplished within three consecutive visual frames. Thus, the proposed I-MSCKF method can improve the navigation performance when suffered from shocks at the initial stage. Moreover, the sigma-point filter is applied at initial stage to improve the accuracy for state estimation. The state vector generated at initial stage from the proposed method is consistent with MSCKF, and thus a seamless transition can be achieved between the initialization and the subsequent navigation in I-MSCKF. Finally, the experimental results show that the proposed I-MSCKF method can improve the robustness and accuracy for monocular visual inertial navigations.
基金supported by the National Natural Science Foundation of China(61472324 61671383)+1 种基金Shaanxi Key Industry Innovation Chain Project(2018ZDCXL-G-12-2 2019ZDLGY14-02-02)
文摘In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation.
基金Supported by the National Key Research and Development Plan(2016YFB1001200)the Natural Science Foundation of China(U1435220,61232013)Natural Science Research Projects of Universities in Jiangsu Province(16KJA520003)
文摘Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.
基金supported by the National Natural Science Foundation of China (40871157)
文摘It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model.
文摘Saliency detection models, which are used to extract salient regions in visual scenes, are widely used in various multimedia processing applications. It has attracted much attention in the area of computer vision over the past decades. Since most images or videos over the Internet are stored in compressed domains such as images in JPEG format and videos in MPEG2 format, H.264 format, and MPEG4 Visual format, many saliency detection models have been proposed in the compressed domain recently. We provide a review of our works on saliency detection models in the compressed domain in this paper.Besides, we introduce some commonly used fusion strategies to combine spatial saliency map and temporal saliency map to compute the final video saliency map.
文摘为提升自动驾驶车辆在多车道行驶与作业时的道路环境感知能力,提出了自动驾驶环境下车道级雷视融合方法 LLV-SLAM(lane-level LiDAR-visual fusion SLAM),并构建了适用于雷视融合的实时定位与建图算法(simultaneous localization and mapping,SLAM)。首先,在视觉特征点提取的基础上引入直方图均衡化,并利用激光雷达获取特征点深度信息,通过视觉特征跟踪以提升SLAM系统鲁棒性。其次,利用视觉关键帧信息对激光点云进行运动畸变校正,并将LeGO-LOAM(lightweight and groud-optimized lidar odometry and mapping)融入视觉ORBSLAM2(oriented FAST and rotated BRIEF SLAM2)以增强闭环检测与矫正性能,降低系统累计误差。最后,将视觉图像所获取的位姿进行坐标转换作为激光里程计的位姿初值,辅助激光雷达SLAM进行三维场景重建。实验结果表明:相比于传统的SLAM方法,融合后的LLV-SLAM方法平均定位时延减少了41.61%;在x、y、z方向上的平均定位误差分别减少了34.63%、38.16%、24.09%;在滚转角、俯仰角、偏航角方向上的平均旋转误差减少了40.8%、37.52%、39.5%。LLV-SLAM算法有效抑制了LeGO-LOAM算法的尺度漂移,实时性和鲁棒性有显著提升,能够满足自动驾驶车辆对多车道道路环境的感知需要。