On an internet of video things(IoVT), an encoder needs to collect a large number of signal samples to improve the reconstruction quality. It is challenging to some occasions where the resources of an encoder are extre...On an internet of video things(IoVT), an encoder needs to collect a large number of signal samples to improve the reconstruction quality. It is challenging to some occasions where the resources of an encoder are extremely limited. The distributed video compressive sensing(DVCS) can save a lot of resources for the encoder. For the skip-block coding at such an encoder, this paper proposes a motion-adaptive adjacent-reference skipping(MAS) algorithm for DVCS with general decoders. The proposed algorithm makes full use of the spatial-temporal correlation between consecutive frames, and the reconstruction quality can be improved significantly. What’s more, the skipping ratio of non-keyframes is adaptive to the difference of their motion-speeds. The proposed algorithm does not need to change any decoder, so it can be easily applied to general decoders. The simulation results show that under different skipping ratios, the proposed algorithm can achieve better reconstruction quality than other existing algorithms, and thus improve the energy-efficiency of the encoder.展开更多
Latent information is difficult to get from the text in speech synthesis.Studies show that features from speech can get more information to help text encoding.In the field of speech encoding,a lot of work has been con...Latent information is difficult to get from the text in speech synthesis.Studies show that features from speech can get more information to help text encoding.In the field of speech encoding,a lot of work has been conducted on two aspects.The first aspect is to encode speech frame by frame.The second aspect is to encode the whole speech to a vector.But the scale in these aspects is fixed.So,encoding speech with an adjustable scale for more latent information is worthy of investigation.But current alignment approaches only support frame-by-frame encoding and speech-to-vector encoding.It remains a challenge to propose a new alignment approach to support adjustable scale speech encoding.This paper presents the dynamic speech encoder with a new alignment approach in conjunction with frame-by-frame encoding and speech-to-vector encoding.The speech feature fromourmodel achieves three functions.First,the speech feature can reconstruct the origin speech while the length of the speech feature is equal to the text length.Second,our model can get text embedding fromspeech,and the encoded speech feature is similar to the text embedding result.Finally,it can transfer the style of synthesis speech and make it more similar to the given reference speech.展开更多
Short videos on the Internet have a huge amount, but most of them are unlabeled. In this paper, a rough short video labelling method based on the image classification neural network is proposed. Convolutional auto-enc...Short videos on the Internet have a huge amount, but most of them are unlabeled. In this paper, a rough short video labelling method based on the image classification neural network is proposed. Convolutional auto-encoder is applied to train and learn unlabeled video frames, in order to obtain feature in the specific level. With these features, the video key-frames are extracted by the feature clustering method. These key-frames which represent the video content are put into an image classification network, so that the labels of every video clip can be got. In addition, the different architectures of convolutional auto-encoder are estimated, and a better performance architecture through the experiment result is selected. In the final experiment, the video frame features from the convolutional auto-encoder are compared with those from other extraction methods, where it illustrates remarkable results by the proposed method.展开更多
基金supported by the National Natural Science Foundation of China(No.62001099)。
文摘On an internet of video things(IoVT), an encoder needs to collect a large number of signal samples to improve the reconstruction quality. It is challenging to some occasions where the resources of an encoder are extremely limited. The distributed video compressive sensing(DVCS) can save a lot of resources for the encoder. For the skip-block coding at such an encoder, this paper proposes a motion-adaptive adjacent-reference skipping(MAS) algorithm for DVCS with general decoders. The proposed algorithm makes full use of the spatial-temporal correlation between consecutive frames, and the reconstruction quality can be improved significantly. What’s more, the skipping ratio of non-keyframes is adaptive to the difference of their motion-speeds. The proposed algorithm does not need to change any decoder, so it can be easily applied to general decoders. The simulation results show that under different skipping ratios, the proposed algorithm can achieve better reconstruction quality than other existing algorithms, and thus improve the energy-efficiency of the encoder.
基金supported by National Key R&D Program of China (2020AAA0107901).
文摘Latent information is difficult to get from the text in speech synthesis.Studies show that features from speech can get more information to help text encoding.In the field of speech encoding,a lot of work has been conducted on two aspects.The first aspect is to encode speech frame by frame.The second aspect is to encode the whole speech to a vector.But the scale in these aspects is fixed.So,encoding speech with an adjustable scale for more latent information is worthy of investigation.But current alignment approaches only support frame-by-frame encoding and speech-to-vector encoding.It remains a challenge to propose a new alignment approach to support adjustable scale speech encoding.This paper presents the dynamic speech encoder with a new alignment approach in conjunction with frame-by-frame encoding and speech-to-vector encoding.The speech feature fromourmodel achieves three functions.First,the speech feature can reconstruct the origin speech while the length of the speech feature is equal to the text length.Second,our model can get text embedding fromspeech,and the encoded speech feature is similar to the text embedding result.Finally,it can transfer the style of synthesis speech and make it more similar to the given reference speech.
基金supported by the National Key R&D Program of China (2018YFB1404100)the Fundamental Research Funds for the Central Universities (CUC18A002-2).
文摘Short videos on the Internet have a huge amount, but most of them are unlabeled. In this paper, a rough short video labelling method based on the image classification neural network is proposed. Convolutional auto-encoder is applied to train and learn unlabeled video frames, in order to obtain feature in the specific level. With these features, the video key-frames are extracted by the feature clustering method. These key-frames which represent the video content are put into an image classification network, so that the labels of every video clip can be got. In addition, the different architectures of convolutional auto-encoder are estimated, and a better performance architecture through the experiment result is selected. In the final experiment, the video frame features from the convolutional auto-encoder are compared with those from other extraction methods, where it illustrates remarkable results by the proposed method.