摘要
针对目前双流卷积神经网络通常使用堆叠RGB帧和光流图分别提取视频的表观信息和运动信息,存在信息冗余和计算复杂度高的问题,基于时域分割网络提出了一种结合光流图、差分图像和并行卷积神经网络的行为识别算法。首先通过分析行为视频中存在的运动模糊现象,设计了一种基于图像特征量的关键帧选取算法,同时构建了一个包含表观信息流和运动信息流的改进时域分割网络,将关键帧RGB图像、非关键帧光流图像和差分图像并行地输入特征提取网络计算分类得分,最后将关键帧与非关键帧的行为类别得分进行平均融合后输入SoftMax层得到视频类别概率。为进一步降低算法的参数量和计算复杂度,设计了一种轻量化卷积神经网络作为特征提取网络。本文算法在UCF101数据集的识别准确率为94.7%,在HMDB51数据集的识别准确率为69.3%,推理速度相比于时域分割网络快了45.3%。实验结果表明,该算法能够高效利用视频的表观信息和运动信息,且具有较高的行为识别准确率。
Aiming at the problems that current two stream convolutional neural network usually uses stacked RGB frames and optical flow images to extract the apparent information and motion information of the video, respectively, and there exist information redundancy and high computational complexity, an action recognition algorithm combining optical flow images, differential images and parallel convolutional neural network is proposed based on temporal segment network. Firstly, a key frame selection algorithm based on image feature quantity is designed through analyzing the motion blur phenomenon existing in the action video. At the same time, an improved temporal segment network containing apparent information flow and motion information flow of the video is constructed. In order to calculate the action classification score, the RGB images of key frames, the optical flow images and differential images of non-key frames are inputted in parallel to the feature extraction network. Finally, the action category scores of key frames and non-key frames are averaged and fused, which then are inputted into the SoftMax layer to obtain the video category probability. In order to further reduce the amount of parameters and computational complexity of the algorithm, a lightweight convolutional neural network is designed and used as the feature extraction network. The experiments on UCF101 and HMDB51 datasets were conducted, and the recognition accuracies of 94.7% and 69.3% are obtained, respectively, and the inference speed is 45.3% faster compared with temporal segment network. Experiment results indicate that the proposed algorithm can efficiently use the apparent information and motion information of the video, and has a high action recognition accuracy.
作者
周育新
白宏阳
李伟
郭宏伟
徐啸康
Zhou Yuxin;Bai Hongyang;Li Wei;Guo Hongwei;Xu Xiaokang(College of Energy and Power Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;96037 Troop,People's Liberation Army of China,Baoji 721000,China)
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2020年第7期196-204,共9页
Chinese Journal of Scientific Instrument
基金
国家自然科学基金(61603189)项目资助
关键词
卷积神经网络
行为识别
关键帧
轻量化
convolutional neural network
action recognition
key frame
light weight