期刊文献+

基于CLIP多模态预训练模型的动画自动剪辑研究

Research on Automatic Animation Editing Based on CLIP Multimodal Pre-training Model
在线阅读 下载PDF
导出
摘要 动画自动剪辑是影视制作与数字媒体领域的核心任务,但传统方法依赖人工标注与经验规则,存在效率低、泛化性差等局限。提出一种基于CLIP(Contrastive Language–Image Pretraining)多模态预训练模型的智能剪辑框架,通过文本-图像跨模态特征对齐,实现动画帧的语义级筛选与场景分割。模型以CLIP预训练特征为基础,设计动态阈值调整策略优化帧选择精度,结合时间轴分析与视觉焦点追踪算法增强剪辑连贯性。实验结果表明,在四类典型场景(战斗、对话、风景、特写)中平均相似度达0.82,较传统关键词匹配法(KWM)与单模态CNN模型分别提升35.6%与18.3%。热力图显示其帧-文本关联精度超过90%,场景切换节奏密度曲线符合人类视觉感知规律。真实动画测试中,剪辑耗时较人工处理缩短87%,且用户满意度评分达4.6/5.0。 Automatic animation editing is a core task in film production and digital media,but traditional methods rely on manual annotation and empirical rules,resulting in low efficiency and poor generalization.This study proposes an intelligent editing framework based on the CLIP(Contrastive Language-Image Pretraining)multimodal pre-training model,which achieves semantic-level frame selection and scene segmentation through text-image cross-modal feature alignment.The model uses CLIP pre-trained features as the foundation,designs a dynamic threshold adjustment strategy to optimize frame selection accuracy,and combines timeline analysis and visual focus tracking algorithms to enhance editing coherence.Experimental results show that the proposed method achieves an average similarity of 0.82 in four typical scenarios(battle,dialogue,landscape,close-up),which is 35.6%and 18.3%higher than the traditional keyword matching method(KWM)and single-modal CNN model,respectively.The heat map shows that the frame-text association accuracy exceeds 90%,and the scene transition rhythm density curve conforms to human visual perception.In real animation tests,the editing time is reduced by 87%compared to manual processing,and the user satisfaction score reaches 4.6/5.0.
作者 李海燕 陈新生 LI Haiyan;CHEN Xinsheng(Anhui Finance&Trade Vocational College,Hefei 230601,China;College of Architecture&Art,Hefei University of Technology,Hefei 230601,China)
出处 《佳木斯大学学报(自然科学版)》 2025年第7期137-139,136,共4页 Journal of Jiamusi University:Natural Science Edition
基金 2024年度安徽省科研计划编制项目(2024AH052155)。
关键词 CLIP模型 多模态学习 动画剪辑 语义对齐 CLIP model multimodal learning animation editing semantic alignment
  • 相关文献

参考文献5

二级参考文献47

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部