Video classification typically requires large labeled datasets which are costly and time-consuming to obtain.This paper proposes a novel Active Learning(AL)framework to improve video classification performance while m...Video classification typically requires large labeled datasets which are costly and time-consuming to obtain.This paper proposes a novel Active Learning(AL)framework to improve video classification performance while minimizing the human annotation effort.Unlike passive learning methods that randomly select samples for labeling,our approach actively identifies the most informative unlabeled instances to be annotated.Specifically,we develop batch mode AL techniques that select useful videos based on uncertainty and diversity sampling.The algorithm then extracts a diverse set of representative keyframes from the queried videos.Human annotators only need to label these keyframes instead of watching the full videos.We implement this approach by leveraging recent advances in deep neural networks for visual feature extraction and sequence modeling.Our experiments on benchmark datasets demonstrate that our method achieves significant improvements in video classification accuracy with less training data.This enables more efficient video dataset construction and could make large-scale video annotation more feasible.Our AL framework minimizes the human effort needed to train accurate video classifiers.展开更多
文摘Video classification typically requires large labeled datasets which are costly and time-consuming to obtain.This paper proposes a novel Active Learning(AL)framework to improve video classification performance while minimizing the human annotation effort.Unlike passive learning methods that randomly select samples for labeling,our approach actively identifies the most informative unlabeled instances to be annotated.Specifically,we develop batch mode AL techniques that select useful videos based on uncertainty and diversity sampling.The algorithm then extracts a diverse set of representative keyframes from the queried videos.Human annotators only need to label these keyframes instead of watching the full videos.We implement this approach by leveraging recent advances in deep neural networks for visual feature extraction and sequence modeling.Our experiments on benchmark datasets demonstrate that our method achieves significant improvements in video classification accuracy with less training data.This enables more efficient video dataset construction and could make large-scale video annotation more feasible.Our AL framework minimizes the human effort needed to train accurate video classifiers.