期刊文献+

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

在线阅读 下载PDF
导出
摘要 In the realm of video understanding,the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content.This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory(BiLSTM)constructed Variational Sequence-to-Sequence(CBVSS)approach.The proposed framework is adept at capturing intricate temporal dependencies within video sequences,enabling a more nuanced and contextually relevant description of dynamic scenes.However,optimizing its parameters for improved performance remains a crucial challenge.In response,in this research Golden Eagle Optimization(GEO)a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters.The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions.The proposed attains an overall higher Recall of 59.75%and Precision of 63.78%for both datasets.Additionally,the proposed CBVSS method demonstrated superior performance across both datasets,achieving the highest METEOR(25.67)and CIDER(39.87)scores on the ActivityNet dataset,and further outperforming all compared models on the YouCook2 dataset with METEOR(28.67)and CIDER(43.02),highlighting its effectiveness in generating semantically rich and contextually accurate video captions.
出处 《Journal of Bionic Engineering》 2025年第5期2700-2716,共17页 仿生工程学报(英文版)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部