“博古问津”:知识图谱增强的文化遗产领域多模态大模型

Bogu-Wenjin:A cultural heritage domain multimodal large model enhanced by knowledge graphs

下载PDF

导出

摘要近年来,大语言模型(LLMs)和多模态大模型(MLMs)在自然语言处理和多模态内容理解方面取得了显著成就,然而,这些通用模型在处理文化遗产相关任务时存在明显缺陷,如对领域专业术语的理解存在偏差、缺乏文化历史背景导致回答不够深入以及知识幻觉等问题,使得输出结果难以满足实际需求。针对这些挑战,首次提出了面向文化遗产领域的多模态大模型——“博古问津”。首先,设计半自动化策略构建大规模的多模态文化遗产数据集并形成多模态知识图谱;然后,利用构建的数据集对通用大模型进行图文对齐和指令微调两阶段训练,以适应文化遗产领域的特定需求;此外,还引入了知识图谱作为辅助知识库,通过图文检索和关系检索策略,有效提升了模型在文化遗产领域问答任务上的可信度和可解释性。实验结果表明,“博古问津”在文物图像描述、属性问题解答及关系问题理解等多个方面表现优异,相较于通用多模态大模型,对复杂文化内容的理解和回答能力提升效果显著,分别在文物图像描述、文物属性问题和文物关系问题3个不同任务的综合分值上高出次优模型21.4%、53%和20.6%。 In recent years,large language models(LLMs)and multimodal large models(MLMs)have made significant achievements in natural language processing and multimodal content understanding.However,these general-purpose models have obvious shortcomings when dealing with tasks related to cultural heritage,such as biased understanding of domain-specific terminology,lack of cultural and historical background leading to superficial answers,and knowledge hallucination issues,making it difficult for the results to meet actual needs.In response to these challenges,this paper first proposes a multimodal large model oriented towards the field of cultural heritage:Bogu-Wenjin.This study first designs a semi-automated strategy to construct a large-scale multimodal cultural heritage dataset and forms a multimodal knowledge graph.Using the constructed dataset,the general large model is trained in two stages:image-text alignment and instruction fine-tuning,to adapt to the specific needs of the cultural heritage field.In addition,a knowledge graph is introduced as an auxiliary knowledge base,and the credibility and interpretability of the model in the field of cultural heritage Q&A tasks are effectively improved through graph-text retrieval and relationship retrieval strategies.Experimental results show that Bogu-Wenjin performs excellently in various aspects such as artifact image description,attribute question answering,and relationship question understanding.Compared with general multimodal large models,it significantly improves the ability to understand and answer complex cultural content,with a comprehensive score increase of 21.4%,53%and 20.6%in artifact image description,artifact attribute questions,and artifact relationship questions respectively over the second-best model.

作者赵万青徐朝阳谢智伟张少博张晓丹彭进业 ZHAO Wanqing;XU Chaoyang;XIE Zhiwei;ZHANG Shaobo;ZHANG Xiaodan;PENG Jinye(School of Electronic Information,Northwest University,Xi'an 710127,China;Shaanxi Key Laboratory of Higher Education Institution of Generative Artificial Intelligence and Mixed Reality,Xi'an 710127,China)

机构地区西北大学电子信息学院[人工智能学院] 陕西省高等学校生成式人工智能与混合现实重点实验室

出处《西北大学学报(自然科学版)》北大核心 2025年第6期1267-1284,共18页 Journal of Northwest University(Natural Science Edition)

基金国家重点研发计划(2024YFF0907600) 国家自然科学基金(62273275) 陕西省自然科学基础研究计划青年项目(2025JC-YBQN-847)。

关键词多模态大模型文化遗产视觉问答知识图谱模型微调知识增强 multimodal large models cultural heritage visual question answering knowledge graphs model fine-tuning knowledge enhancement

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1Shukang Yin,Chaoyou Fu,Sirui Zhao,Ke Li,Xing Sun,Tong Xu,Enhong Chen.A survey on multimodal large language models[J].National Science Review,2024,11(12):271-290. 被引量：12
2程启玥.我国文化遗产领域数字人文研究热点与趋势探究[J].现代信息科技,2024,8(14):106-111. 被引量：1
3徐增林,盛泳潘,贺丽荣,王雅芳.知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606. 被引量：605
4杜鹏飞,李小勇,高雅丽.多模态视觉语言表征学习研究综述[J].软件学报,2021,32(2):327-348. 被引量：38

二级参考文献150

1史树明.自动和半自动知识提取[J].中国计算机学会通讯,2013.9(8):65-73.
2彭冬梅,刘肖健,孙守迁.信息视角：非物质文化遗产保护的数字化理论[J].计算机辅助设计与图形学学报,2008(1):117-123. 被引量：75
3张坤.面向知识图谱的搜索技术(搜狗)[EB/OL].[2015-02-18].http://www.cipsc.org.cn/kgl/.
4李涓子.知识图谱:大数据语义链接的基石[EB/OL].[2015-02-20].http://www.cipsc.org,cn/kg2/.
5SHETH A, THIRUNARAYAN K. Semantics empowered Web 3.0:managing enterprise, social, sensor, and cloud-based data and service for advanced applications[M]. San Rafael, CA: Morgan and Claypool, 2013.
6BERNERS-LEE T, HENDLER J, LASSILA O. The semantic Web[J]. Scientific American Magazine, 2008, 23(1): 1-4.
7AMIT S. Introducing the knowledge graph[R]. America: Official Blog of Google, 2012.
8Wikipedia. Knowledge graph[EB/OL]. [2016-05-09]. https://en.wikipedia.org/wiki/Knowledge_Graph.
9Shenshouer. Neo4j[EB/OL], [2016-05-09]. http://neo4j. comL.
10FlockDB Official. FlockDB[EB/OL]. [2016-05-09]. http:// webscripts.softpedia.com/script/Database-Tools/FlockDB- 66248.html.

共引文献652

1周忠林,汤贞飞.知识图谱下自主性学习系统逻辑架构[J].中国新通信,2020,22(3):217-217.
2赵豆豆,王宇骏,刘蕤,刘昶.基于多模态知识图谱的药用植物智能问答系统构建[J].知识管理论坛,2024(5):487-504. 被引量：2
3张凯,石栖.基于知识图谱的实验方案推荐研究——以有机太阳能电池为例[J].知识管理论坛,2024(5):448-459.
4卢嘉新,甘萌雨,林雅情.中国休闲渔业研究进展回顾与启示——基于Citespace知识图谱分析[J].中国生态旅游,2021(4):615-630. 被引量：4
5徐安迎,胡孔法,杨涛.基于Neo4j的肺癌中医诊疗知识图谱构建研究[J].世界科学技术-中医药现代化,2023,25(4):1456-1461. 被引量：18
6李芊芊,付兴,杨凤,侯鉴宸,陶晓华,韩帅,贾昌民.基于“病脉证并治”诊疗思维的《伤寒论》知识图谱构建与应用[J].世界科学技术-中医药现代化,2022,24(9):3613-3621. 被引量：19
7韩少恒,杨家荣.智能制造型企业知识图谱的构建[J].上海电气技术,2022,15(1):60-62.
8张敏杰,徐宁,胡俊华,王宇飞,李晨,徐剑波,张诗玉.面向变压器智能运检的知识图谱构建和智能问答技术研究[J].全球能源互联网,2020,3(6):607-617. 被引量：15
9刘天航,聂鑫,李野,刘海涛,吴聪.突发事件处置智能问答若干问题研究[J].网络安全与数据治理,2023,42(S02):12-16.
10余晓蕾,朱笛,王立昊,林军,向剑文.基于知识图谱的嵌入式操作系统测试用例复用推荐模型[J].武汉大学学报（理学版）,2023,69(2):187-194. 被引量：4

1关媛.中国传统文化元素融入动画造型与角色性格[J].淮北职业技术学院学报,2025,24(4):104-107.
2王金桥,杨蓓莹.多模态大模型的发展综述及思考[J].无线电工程,2025,55(9):1727-1742.
3贾建萍.伊斯兰教本体与属性问题的中国化呈现[J].复印报刊资料(宗教),2024(6):88-96.
4徐嫣.图像学视角下的苏嘉杭地区传统民居装饰图案研究[J].艺术时尚,2024(8):0066-0069.
5刘紫艳.长沙窑陶瓷书法艺术的装饰性[J].陶瓷,2025(4):103-106. 被引量：1
6完代才让.浅析阿妈周措圣湖旅游业的文化历史背景[J].甘南民族文化研究(藏文),2021(4):112-124.
7黄琼.算力资源与企业财务战略的协同优化策略研究[J].中国会展,2025(23):71-73.
8Parastoo Nazeri,Zhou Na,Shamsollah Ayoubi,Hossein Khademi,Seyed Roohollah Mousavi,Farideh Abbaszadeh Afshar,Artemi Cerdà.Spatial dynamics of soil organic carbon and total nitrogen concerning aggregate size fractions using machine learning models[J].Soil Ecology Letters,2025,7(2):337-355.
9葛士凯.识解理论下的中华思想文化术语翻译——以《中华思想文化关键词365》为例[J].现代语言学,2025,13(4):675-681.
10江莹妍,毕文龙,王丽.生态文明建设视角下城市开发边界荒野景观营造——以东川团结渠生态廊道建设项目为例[J].云南建筑,2025(3):31-34.

西北大学学报(自然科学版)

2025年第6期

浏览历史

内容加载中请稍等...

“博古问津”:知识图谱增强的文化遗产领域多模态大模型

参考文献4

二级参考文献150

共引文献652

相关作者

相关机构

相关主题

浏览历史