The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natu...The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natural scene image classification.This approach consists of one visual perception model and two visual cognition models.The visual perception model,composed of two steps,is used to extract discriminative features from natural scene images.In the first step,we mimic the oriented and bandpass properties of human primary visual cortex by a special complex wavelets transform,which can decompose a natural scene image into a series of 2D spatial structure signals.In the second step,a hybrid statistical feature extraction method is used to generate gist features from those 2D spatial structure signals.Then we design a cognitive feedback model to realize adaptive optimization for the visual perception model.At last,we build a multiple semantics based cognition model to imitate human cognitive mode in rapid natural scene categorization.Experiments on natural scene datasets show that the proposed method achieves high efficiency and accuracy for natural scene classification.展开更多
针对动态场景下视觉SLAM(Simultaneous Localization and Mapping)系统中深度学习分割网络实时性不足,以及相机非期望运动导致位姿估计偏差的问题,提出一种基于跨域掩膜分割的视觉SLAM算法.该算法采用轻量化YOLO-fastest网络结合背景减...针对动态场景下视觉SLAM(Simultaneous Localization and Mapping)系统中深度学习分割网络实时性不足,以及相机非期望运动导致位姿估计偏差的问题,提出一种基于跨域掩膜分割的视觉SLAM算法.该算法采用轻量化YOLO-fastest网络结合背景减除法实现运动物体检测,利用深度图结合深度阈值分割构建跨域掩膜分割机制,并设计相机运动几何校正策略补偿检测框坐标误差,在实现运动物体分割的同时提升处理速度.为优化特征点利用率,采用金字塔光流对动态特征点进行帧间连续跟踪与更新,同时确保仅由静态特征点参与位姿估计过程.在TUM数据集上进行系统性评估,实验结果表明,相比于ORB-SLAM3算法,该算法的绝对位姿误差平均降幅达97.1%,与使用深度学习分割网络的DynaSLAM和DS-SLAM的动态SLAM算法相比,其单帧跟踪时间大幅减少,在精度与效率之间实现了更好的平衡.展开更多
针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,...针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。展开更多
为提升煤场的管理水平,并保证不同环境下的盘煤精度,研究Unity3D在工业煤场三维可视化中的关键技术。该技术的数据采集模块利用激光扫描仪采集工业煤场的点云数据,并通过基于面片的多视角立体视觉(PMVS)算法重建该数据;随后将该数据输...为提升煤场的管理水平,并保证不同环境下的盘煤精度,研究Unity3D在工业煤场三维可视化中的关键技术。该技术的数据采集模块利用激光扫描仪采集工业煤场的点云数据,并通过基于面片的多视角立体视觉(PMVS)算法重建该数据;随后将该数据输入至三维场景构建模块,该模块利用Dynamo for Revit软件生成煤场的三维场景模型。场景渲染和可视化模块在Unity3D技术的支撑下渲染该模型,并完成模型可视化展示;结合构建的模型结果和点云数据完成煤场各区域煤堆的体积计算,实现煤场盘点。测试结果显示,该技术生成的三维场景模型能完整保留煤堆的形态细节,且能可靠完成不同高度煤堆的体积计算。展开更多
In the era of information and communication technology (ICT) and big data, the map gradually shows a new qualitative feature of “spatiotemporal ubiquitous” with the extension of its object space, expression space an...In the era of information and communication technology (ICT) and big data, the map gradually shows a new qualitative feature of “spatiotemporal ubiquitous” with the extension of its object space, expression space and information source, which challenges the theory of cartographic visualization. This paper discusses the ubiquitous map visualization from the object content and expression form. Oriented to the ternary space, it divides the object dimension of ubiquitous map visualization and analyzes the expression characteristics of ubiquitous map visualization. Based on that, it constructs the variable system, symbol system and method system of ubiquitous map visualization. With three cases of the metro roadmap, the tag map, and the three-dimensional (3D) city map, the application of the proposed content is explained to illustrate its effectiveness. The research in this paper is expected to further enrich the theoretical basis of cartographic visualization and provide theoretical support for the expression and application of ubiquitous map visualization.展开更多
文摘The process of human natural scene categorization consists of two correlated stages: visual perception and visual cognition of natural scenes.Inspired by this fact,we propose a biologically plausible approach for natural scene image classification.This approach consists of one visual perception model and two visual cognition models.The visual perception model,composed of two steps,is used to extract discriminative features from natural scene images.In the first step,we mimic the oriented and bandpass properties of human primary visual cortex by a special complex wavelets transform,which can decompose a natural scene image into a series of 2D spatial structure signals.In the second step,a hybrid statistical feature extraction method is used to generate gist features from those 2D spatial structure signals.Then we design a cognitive feedback model to realize adaptive optimization for the visual perception model.At last,we build a multiple semantics based cognition model to imitate human cognitive mode in rapid natural scene categorization.Experiments on natural scene datasets show that the proposed method achieves high efficiency and accuracy for natural scene classification.
文摘针对动态场景下视觉SLAM(Simultaneous Localization and Mapping)系统中深度学习分割网络实时性不足,以及相机非期望运动导致位姿估计偏差的问题,提出一种基于跨域掩膜分割的视觉SLAM算法.该算法采用轻量化YOLO-fastest网络结合背景减除法实现运动物体检测,利用深度图结合深度阈值分割构建跨域掩膜分割机制,并设计相机运动几何校正策略补偿检测框坐标误差,在实现运动物体分割的同时提升处理速度.为优化特征点利用率,采用金字塔光流对动态特征点进行帧间连续跟踪与更新,同时确保仅由静态特征点参与位姿估计过程.在TUM数据集上进行系统性评估,实验结果表明,相比于ORB-SLAM3算法,该算法的绝对位姿误差平均降幅达97.1%,与使用深度学习分割网络的DynaSLAM和DS-SLAM的动态SLAM算法相比,其单帧跟踪时间大幅减少,在精度与效率之间实现了更好的平衡.
文摘针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。
文摘为提升煤场的管理水平,并保证不同环境下的盘煤精度,研究Unity3D在工业煤场三维可视化中的关键技术。该技术的数据采集模块利用激光扫描仪采集工业煤场的点云数据,并通过基于面片的多视角立体视觉(PMVS)算法重建该数据;随后将该数据输入至三维场景构建模块,该模块利用Dynamo for Revit软件生成煤场的三维场景模型。场景渲染和可视化模块在Unity3D技术的支撑下渲染该模型,并完成模型可视化展示;结合构建的模型结果和点云数据完成煤场各区域煤堆的体积计算,实现煤场盘点。测试结果显示,该技术生成的三维场景模型能完整保留煤堆的形态细节,且能可靠完成不同高度煤堆的体积计算。
文摘In the era of information and communication technology (ICT) and big data, the map gradually shows a new qualitative feature of “spatiotemporal ubiquitous” with the extension of its object space, expression space and information source, which challenges the theory of cartographic visualization. This paper discusses the ubiquitous map visualization from the object content and expression form. Oriented to the ternary space, it divides the object dimension of ubiquitous map visualization and analyzes the expression characteristics of ubiquitous map visualization. Based on that, it constructs the variable system, symbol system and method system of ubiquitous map visualization. With three cases of the metro roadmap, the tag map, and the three-dimensional (3D) city map, the application of the proposed content is explained to illustrate its effectiveness. The research in this paper is expected to further enrich the theoretical basis of cartographic visualization and provide theoretical support for the expression and application of ubiquitous map visualization.