期刊文献+

“博古问津”:知识图谱增强的文化遗产领域多模态大模型

Bogu-Wenjin:A cultural heritage domain multimodal large model enhanced by knowledge graphs
在线阅读 下载PDF
导出
摘要 近年来,大语言模型(LLMs)和多模态大模型(MLMs)在自然语言处理和多模态内容理解方面取得了显著成就,然而,这些通用模型在处理文化遗产相关任务时存在明显缺陷,如对领域专业术语的理解存在偏差、缺乏文化历史背景导致回答不够深入以及知识幻觉等问题,使得输出结果难以满足实际需求。针对这些挑战,首次提出了面向文化遗产领域的多模态大模型——“博古问津”。首先,设计半自动化策略构建大规模的多模态文化遗产数据集并形成多模态知识图谱;然后,利用构建的数据集对通用大模型进行图文对齐和指令微调两阶段训练,以适应文化遗产领域的特定需求;此外,还引入了知识图谱作为辅助知识库,通过图文检索和关系检索策略,有效提升了模型在文化遗产领域问答任务上的可信度和可解释性。实验结果表明,“博古问津”在文物图像描述、属性问题解答及关系问题理解等多个方面表现优异,相较于通用多模态大模型,对复杂文化内容的理解和回答能力提升效果显著,分别在文物图像描述、文物属性问题和文物关系问题3个不同任务的综合分值上高出次优模型21.4%、53%和20.6%。 In recent years,large language models(LLMs)and multimodal large models(MLMs)have made significant achievements in natural language processing and multimodal content understanding.However,these general-purpose models have obvious shortcomings when dealing with tasks related to cultural heritage,such as biased understanding of domain-specific terminology,lack of cultural and historical background leading to superficial answers,and knowledge hallucination issues,making it difficult for the results to meet actual needs.In response to these challenges,this paper first proposes a multimodal large model oriented towards the field of cultural heritage:Bogu-Wenjin.This study first designs a semi-automated strategy to construct a large-scale multimodal cultural heritage dataset and forms a multimodal knowledge graph.Using the constructed dataset,the general large model is trained in two stages:image-text alignment and instruction fine-tuning,to adapt to the specific needs of the cultural heritage field.In addition,a knowledge graph is introduced as an auxiliary knowledge base,and the credibility and interpretability of the model in the field of cultural heritage Q&A tasks are effectively improved through graph-text retrieval and relationship retrieval strategies.Experimental results show that Bogu-Wenjin performs excellently in various aspects such as artifact image description,attribute question answering,and relationship question understanding.Compared with general multimodal large models,it significantly improves the ability to understand and answer complex cultural content,with a comprehensive score increase of 21.4%,53%and 20.6%in artifact image description,artifact attribute questions,and artifact relationship questions respectively over the second-best model.
作者 赵万青 徐朝阳 谢智伟 张少博 张晓丹 彭进业 ZHAO Wanqing;XU Chaoyang;XIE Zhiwei;ZHANG Shaobo;ZHANG Xiaodan;PENG Jinye(School of Electronic Information,Northwest University,Xi'an 710127,China;Shaanxi Key Laboratory of Higher Education Institution of Generative Artificial Intelligence and Mixed Reality,Xi'an 710127,China)
出处 《西北大学学报(自然科学版)》 北大核心 2025年第6期1267-1284,共18页 Journal of Northwest University(Natural Science Edition)
基金 国家重点研发计划(2024YFF0907600) 国家自然科学基金(62273275) 陕西省自然科学基础研究计划青年项目(2025JC-YBQN-847)。
关键词 多模态大模型 文化遗产 视觉问答 知识图谱 模型微调 知识增强 multimodal large models cultural heritage visual question answering knowledge graphs model fine-tuning knowledge enhancement
  • 相关文献

参考文献4

二级参考文献150

  • 1史树明.自动和半自动知识提取[J].中国计算机学会通讯,2013.9(8):65-73.
  • 2彭冬梅,刘肖健,孙守迁.信息视角:非物质文化遗产保护的数字化理论[J].计算机辅助设计与图形学学报,2008(1):117-123. 被引量:75
  • 3张坤.面向知识图谱的搜索技术(搜狗)[EB/OL].[2015-02-18].http://www.cipsc.org.cn/kgl/.
  • 4李涓子.知识图谱:大数据语义链接的基石[EB/OL].[2015-02-20].http://www.cipsc.org,cn/kg2/.
  • 5SHETH A, THIRUNARAYAN K. Semantics empowered Web 3.0:managing enterprise, social, sensor, and cloud-based data and service for advanced applications[M]. San Rafael, CA: Morgan and Claypool, 2013.
  • 6BERNERS-LEE T, HENDLER J, LASSILA O. The semantic Web[J]. Scientific American Magazine, 2008, 23(1): 1-4.
  • 7AMIT S. Introducing the knowledge graph[R]. America: Official Blog of Google, 2012.
  • 8Wikipedia. Knowledge graph[EB/OL]. [2016-05-09]. https://en.wikipedia.org/wiki/Knowledge_Graph.
  • 9Shenshouer. Neo4j[EB/OL], [2016-05-09]. http://neo4j. comL.
  • 10FlockDB Official. FlockDB[EB/OL]. [2016-05-09]. http:// webscripts.softpedia.com/script/Database-Tools/FlockDB- 66248.html.

共引文献652

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部