In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to elimin...In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to eliminate the flickering artifacts and smooth inaccuracy in depth recovery. So the improved global stereo matching based on graph cut and energy optimization is implemented. In temporal domain, the penalty function with coherence factor is introduced for temporal consistency, and the factor is determined by Lucas-Kanade optical flow weighted histogram similarity constraint(LKWHSC). In spatial domain, the joint bilateral truncated absolute difference(JBTAD) is proposed for segmentation smoothing. The method can smooth naturally and uniformly in low-gradient region and avoid over-smoothing as well as keep edge sharpness in high-gradient discontinuities to realize spatial consistency. The experimental results show that the algorithm can obtain better spatial and temporal consistent depth maps compared with the existing algorithms.展开更多
Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse...Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.展开更多
Segmentation of semantic Video Object Planes (VOP's) from video sequence is a key to the standard MPEG-4 with content-based video coding. In this paper, the approach of automatic Segmentation of VOP's Based on...Segmentation of semantic Video Object Planes (VOP's) from video sequence is a key to the standard MPEG-4 with content-based video coding. In this paper, the approach of automatic Segmentation of VOP's Based on Spatio-Temporal Information (SBSTI) is proposed.The proceeding results demonstrate the good performance of the algorithm.展开更多
基金the Science and Technology Innovation Project of Ministry of Culture of China(No.2014KJCXXM08)the National Key Technology Research and Development Program of the Ministry of Science and Technology of China(No.2012BAH37F02)the National High Technology Research and Development Program(863)of China(No.2011AA01A107)
文摘In the paper, an approach is proposed for the problem of consistency in depth maps estimation from binocular stereo video sequence. The consistent method includes temporal consistency and spatial consistency to eliminate the flickering artifacts and smooth inaccuracy in depth recovery. So the improved global stereo matching based on graph cut and energy optimization is implemented. In temporal domain, the penalty function with coherence factor is introduced for temporal consistency, and the factor is determined by Lucas-Kanade optical flow weighted histogram similarity constraint(LKWHSC). In spatial domain, the joint bilateral truncated absolute difference(JBTAD) is proposed for segmentation smoothing. The method can smooth naturally and uniformly in low-gradient region and avoid over-smoothing as well as keep edge sharpness in high-gradient discontinuities to realize spatial consistency. The experimental results show that the algorithm can obtain better spatial and temporal consistent depth maps compared with the existing algorithms.
基金Supported by the Future Network Scientific Research Fund Project of Jiangsu Province (No. FNSRFP2021YB26)the Jiangsu Key R&D Fund on Social Development (No. BE2022789)the Science Foundation of Nanjing Institute of Technology (No. ZKJ202003)。
文摘Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame-level attention module is integrated into an improved VGG-based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed-dimension representation.The VGG-based network with an enhanced branch embeds face images into feature vectors.The frame-level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state-of-the-art performance.
文摘Segmentation of semantic Video Object Planes (VOP's) from video sequence is a key to the standard MPEG-4 with content-based video coding. In this paper, the approach of automatic Segmentation of VOP's Based on Spatio-Temporal Information (SBSTI) is proposed.The proceeding results demonstrate the good performance of the algorithm.