The exponential growth of video content has driven significant advancements in video summarization techniques in recent years. Breakthroughs in deep learning have been particularly transformative, enabling more effect...The exponential growth of video content has driven significant advancements in video summarization techniques in recent years. Breakthroughs in deep learning have been particularly transformative, enabling more effective detection of key information and creating new possibilities for video synopsis. To summarize recent progress and accelerate research in this field,this paper provides a comprehensive review of deep learningbased video summarization methods developed over the past decade. We begin by examining the research landscape of video abstraction technologies and identifying core challenges in video summarization. Subsequently, we systematically analyze prevailing deep learning frameworks and methodologies employed in current video summarization systems, offering researchers a clear roadmap of the field's evolution. Unlike previous review works,we first classify research papers based on the structural hierarchy of the video(from frame-level to shot-level to video-level),then further categorize them according to the summary backbone model(feature extraction and spatiotemporal modeling).This approach provides a more systematic and hierarchical organization of the documents. Following this comprehensive review,we summarize the benchmark datasets and evaluation metrics commonly employed in the field. Finally, we analyze persistent challenges and propose insightful directions for future research,providing a forward-looking perspective on video summarization technologies. This systematic literature review is of great reference value to new researchers exploring the fields of deep learning and video summarization.展开更多
We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of t...We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains.展开更多
针对现有视频摘要算法以及摘要评价方法未能充分考虑工业智能终端所感知的视频数据特点以及工业智能感知相关应用需求,改写了代表性与多样性两种评价约束,基于此,结合DWConv(Depthwise Convolution)与ConvLSTM(Convolutional Long Short...针对现有视频摘要算法以及摘要评价方法未能充分考虑工业智能终端所感知的视频数据特点以及工业智能感知相关应用需求,改写了代表性与多样性两种评价约束,基于此,结合DWConv(Depthwise Convolution)与ConvLSTM(Convolutional Long Short-Term Memory)设计了一种混合双向多层的工业视频摘要方案。该方案由全局粗粒度特征提取、局部细粒度特征提取、反馈更新以及以查询为驱动的特征融合这4部分组成。为应对工业数据高冗余性、感知的视频噪声大等特点,围绕着ConvLSTM与注意力机制搭建全局特征提取模块;为充分提取视频数据的时空特性,结合注意力机制与DB-DWConvLSTM构建局部特征提取模块;针对工业数据具有的周期性与局部稳定性,借助残差网络思想,设计了融合DWConv反馈模块;为了更加凸显关键帧特征,便于更好的筛选关键帧,研究以查询驱动的特征融合模块。为验证方案的有效性与可行性,将该方案在TVSum与SumMe两个数据集上进行分析验证。实验结果表明:该方案在交叉验证、消融实验以及对比分析中都有着较好的性能。展开更多
基金supported by UKRI(EP/Z000025/1)Horizon Europe Programme under the MSCA grant for the ACMod project(101130271)。
文摘The exponential growth of video content has driven significant advancements in video summarization techniques in recent years. Breakthroughs in deep learning have been particularly transformative, enabling more effective detection of key information and creating new possibilities for video synopsis. To summarize recent progress and accelerate research in this field,this paper provides a comprehensive review of deep learningbased video summarization methods developed over the past decade. We begin by examining the research landscape of video abstraction technologies and identifying core challenges in video summarization. Subsequently, we systematically analyze prevailing deep learning frameworks and methodologies employed in current video summarization systems, offering researchers a clear roadmap of the field's evolution. Unlike previous review works,we first classify research papers based on the structural hierarchy of the video(from frame-level to shot-level to video-level),then further categorize them according to the summary backbone model(feature extraction and spatiotemporal modeling).This approach provides a more systematic and hierarchical organization of the documents. Following this comprehensive review,we summarize the benchmark datasets and evaluation metrics commonly employed in the field. Finally, we analyze persistent challenges and propose insightful directions for future research,providing a forward-looking perspective on video summarization technologies. This systematic literature review is of great reference value to new researchers exploring the fields of deep learning and video summarization.
文摘We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains.
文摘针对现有视频摘要算法以及摘要评价方法未能充分考虑工业智能终端所感知的视频数据特点以及工业智能感知相关应用需求,改写了代表性与多样性两种评价约束,基于此,结合DWConv(Depthwise Convolution)与ConvLSTM(Convolutional Long Short-Term Memory)设计了一种混合双向多层的工业视频摘要方案。该方案由全局粗粒度特征提取、局部细粒度特征提取、反馈更新以及以查询为驱动的特征融合这4部分组成。为应对工业数据高冗余性、感知的视频噪声大等特点,围绕着ConvLSTM与注意力机制搭建全局特征提取模块;为充分提取视频数据的时空特性,结合注意力机制与DB-DWConvLSTM构建局部特征提取模块;针对工业数据具有的周期性与局部稳定性,借助残差网络思想,设计了融合DWConv反馈模块;为了更加凸显关键帧特征,便于更好的筛选关键帧,研究以查询驱动的特征融合模块。为验证方案的有效性与可行性,将该方案在TVSum与SumMe两个数据集上进行分析验证。实验结果表明:该方案在交叉验证、消融实验以及对比分析中都有着较好的性能。