Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate....Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).展开更多
Generation of a depth-map from 2D video is the kernel of DIBR (Depth Image Based Rendering) in 2D-3D video conversion systems. However it occupies over most of the system resource where the motion search module takes ...Generation of a depth-map from 2D video is the kernel of DIBR (Depth Image Based Rendering) in 2D-3D video conversion systems. However it occupies over most of the system resource where the motion search module takes up 90% time-consuming in typical motion estimation-based depth-map generation algorithms. In order to reduce the computational complexity, in this paper a new fast depth-map generation algorithm based on motion search is developed, in which a fast diamond search algorithm is adopted to decide whether a 16x16 or 4x4 block size is used based on Sobel operator in the motion search module to obtain a sub-depth-map. Then the sub-depth-map will be fused with the sub-depth-maps gotten from depth from color component Cr and depth from linear perspective modules to compensate and refine detail of the depth-map, finally obtain a better depth-map. The simulation results demonstrate that the new approach can greatly reduce over 50% computational complexity compared to other existing methods.展开更多
Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composi...Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composite mask-based generative adversarial network(CMGAN)is introduced to predict camera motion and binocular depth maps.Specifically,a perceptual generator is constructed to obtain the corresponding parallax map and optical flow between two neighboring frames.Then,an iterative pose improvement strategy is proposed to improve the accuracy of pose estimation.Finally,a composite mask is embedded in the discriminator to sense structural deformation in the synthesized virtual image,thereby increasing the overall structural constraints of the network model,improving the accuracy of camera pose estimation,and reducing drift issues in the VO.Detailed quantitative and qualitative evaluations on the KITTI dataset show that the proposed framework outperforms existing conventional,supervised learning and unsupervised depth VO methods,providing better results in both pose estimation and depth estimation.展开更多
针对大多数同时定位与地图构建(simultaneous localization and mapping,SLAM)系统在动态场景下位姿估计不准确的问题,本文提出了一个基于语义先验的加权极线和深度约束的运动一致性检测算法,以此构建一个室内动态场景下的视觉SLAM系统...针对大多数同时定位与地图构建(simultaneous localization and mapping,SLAM)系统在动态场景下位姿估计不准确的问题,本文提出了一个基于语义先验的加权极线和深度约束的运动一致性检测算法,以此构建一个室内动态场景下的视觉SLAM系统.该系统首先对输入图像进行语义分割,获取潜在运动特征点集合;其次对图像非潜在运动区域进行特征点提取,获取帧间变换的初值,利用加权的极线约束和深度约束完成对潜在外点(如运动特征点)的二次判断,并将外点移除从而更新静态特征点集合.最后利用静态特征点集实现对相机位姿的精确求解,并作为位姿优化的初值送入后端.本文在TUM(慕尼黑工业大学)数据集上的9个动态场景序列以及波恩复杂动态环境数据集的3个图像序列上进行了多次对比测试,其绝对轨迹误差(ATE)的均方根误差(RMSE)与现有先进的动态SLAM系统DS-SLAM相比降低了10.53%~93.75%,对于平移和旋转相对位姿误差(RPE),RMSE指标最高实现73.44%和68.73%的下降.结果表明,改进的方法能够显著降低动态环境下的位姿估计误差.展开更多
基金Sponsored by the Jiangsu Prospective Joint Research Project(Grant No.BY2016022-28)
文摘Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).
文摘Generation of a depth-map from 2D video is the kernel of DIBR (Depth Image Based Rendering) in 2D-3D video conversion systems. However it occupies over most of the system resource where the motion search module takes up 90% time-consuming in typical motion estimation-based depth-map generation algorithms. In order to reduce the computational complexity, in this paper a new fast depth-map generation algorithm based on motion search is developed, in which a fast diamond search algorithm is adopted to decide whether a 16x16 or 4x4 block size is used based on Sobel operator in the motion search module to obtain a sub-depth-map. Then the sub-depth-map will be fused with the sub-depth-maps gotten from depth from color component Cr and depth from linear perspective modules to compensate and refine detail of the depth-map, finally obtain a better depth-map. The simulation results demonstrate that the new approach can greatly reduce over 50% computational complexity compared to other existing methods.
基金supported by the Program of Graduate Education and Teaching Reform in Tianjin University of Technology(Nos.YBXM2204 and ZDXM2202)the National Natural Science Foundation of China(Nos.62203331 and 62103299)。
文摘Although deep learning methods have been widely applied in slam visual odometry(VO)over the past decade with impressive improvements,the accuracy remains limited in complex dynamic environments.In this paper,a composite mask-based generative adversarial network(CMGAN)is introduced to predict camera motion and binocular depth maps.Specifically,a perceptual generator is constructed to obtain the corresponding parallax map and optical flow between two neighboring frames.Then,an iterative pose improvement strategy is proposed to improve the accuracy of pose estimation.Finally,a composite mask is embedded in the discriminator to sense structural deformation in the synthesized virtual image,thereby increasing the overall structural constraints of the network model,improving the accuracy of camera pose estimation,and reducing drift issues in the VO.Detailed quantitative and qualitative evaluations on the KITTI dataset show that the proposed framework outperforms existing conventional,supervised learning and unsupervised depth VO methods,providing better results in both pose estimation and depth estimation.
文摘针对大多数同时定位与地图构建(simultaneous localization and mapping,SLAM)系统在动态场景下位姿估计不准确的问题,本文提出了一个基于语义先验的加权极线和深度约束的运动一致性检测算法,以此构建一个室内动态场景下的视觉SLAM系统.该系统首先对输入图像进行语义分割,获取潜在运动特征点集合;其次对图像非潜在运动区域进行特征点提取,获取帧间变换的初值,利用加权的极线约束和深度约束完成对潜在外点(如运动特征点)的二次判断,并将外点移除从而更新静态特征点集合.最后利用静态特征点集实现对相机位姿的精确求解,并作为位姿优化的初值送入后端.本文在TUM(慕尼黑工业大学)数据集上的9个动态场景序列以及波恩复杂动态环境数据集的3个图像序列上进行了多次对比测试,其绝对轨迹误差(ATE)的均方根误差(RMSE)与现有先进的动态SLAM系统DS-SLAM相比降低了10.53%~93.75%,对于平移和旋转相对位姿误差(RPE),RMSE指标最高实现73.44%和68.73%的下降.结果表明,改进的方法能够显著降低动态环境下的位姿估计误差.