We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and m...We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose.Then,we design a bi-objective optimization strategy to iteratively refine the obtained estimations.This strategy achieves faster speed and more accurate outputs.Finally,we further apply algebraic filtering processing,including Gaussian filter for background removal and extended Kalman filter for target prediction,to maintain real-time tracking superiority.Only general RGB photos or videos are required,which are captured by a commodity monocular camera without any priori or label.We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61872354,61772523,61620106003,and 61802406)the National Key R&D Program of China(No.2019YFB2204104)+2 种基金the Beijing Natural Science Foundation(Nos.L182059 and Z190004)the Intelligent Science and Technology Advanced Subject Project of University of Chinese Academy of Sciences(No.115200S001)the Alibaba Group through Alibaba Innovative Research Program。
文摘We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose.Then,we design a bi-objective optimization strategy to iteratively refine the obtained estimations.This strategy achieves faster speed and more accurate outputs.Finally,we further apply algebraic filtering processing,including Gaussian filter for background removal and extended Kalman filter for target prediction,to maintain real-time tracking superiority.Only general RGB photos or videos are required,which are captured by a commodity monocular camera without any priori or label.We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.