摘要
【目的/意义】科学图表作为一种直观且结构化的信息展示方式,能够有效地呈现复杂的研究成果。然而,现有的科学文献解析研究多聚焦于文本对象,对科学图表的解读缺乏深入探索。本文科学图表中流程图为研究对象,探索适用于该类图表的理解方法。【方法/过程】本研究提出了一种两阶段的科学图表理解方法:第一阶段通过生成粗糙边界框实现图表模块的初步定位;第二阶段结合全局与局部特征的视觉注意力点模拟策略,构建正负样例点以优化模型性能,从而实现模块化解析与语义对齐。【结果/结论】在使用1.3万张随机生成流程图数据集及人工标注真实数据集进行实验验证后,结果证明,本文提出的方法在mIoU和mrIoU指标上均优于现有模型,整体性能达到了mrIoU值0.694。即使在复杂流程图的场景下,mrIoU也能达到0.608,较其他模型,提升幅度超0.22。【创新/局限】本研究为科学图表中流程图的交互式理解提供了系统化的解决方案,同时为多模态交互式阅读技术的发展奠定了坚实的理论基础。
【Purpose/significance】Scientific diagrams serve as an intuitive and structured way to present complex research findings.However,existing research on scientific literature analysis has primarily focused on textual objects,with insufficient exploration of dia⁃gram interpretation.The flowcharts and framework diagrams in this paper's scientific diagrams are the research objects,aiming to ex⁃plore intelligent understanding methods applicable to scientific figures.【Method/process】This study proposes a two-stage approach for scientific diagram understanding.In the first stage,coarse bounding boxes are generated to achieve preliminary localization of dia⁃gram modules.In the second stage,a visual attention point simulation strategy that integrates both global and local features is em⁃ployed to construct positive and negative sample points,thereby optimizing model performance and enabling modular parsing and se⁃mantic alignment.【Result/conclusion】Experiments on a manually annotated real dataset of 13,000 randomly generated flowchart data show that the method proposed in this paper outperforms existing models in both mIoU and mrIoU metrics.The overall performance mrIoU is as high as 0.694.Even in complex flowchart scenarios,the mrIoU reaches 0.608,which is more than 0.22 higher than other models.【Innovation/limitation】This method provides a systematic solution for interactive flowchart diagram understanding and estab⁃lishes a theoretical foundation for multimodal interactive reading technologies.
作者
程齐凯
刘富康
石湘
黄永
陆伟
CHENG Qikai;LIU Fukang;SHI Xiang;HUANG Yong;LU Wei(School of Information Management,Wuhan University,Wuhan,430072,China;Institute of Intelligence and Innovation Governance,Wuhan University,Wuhan 430072,China)
出处
《情报科学》
北大核心
2025年第9期109-121,132,共14页
Information Science
基金
国家重点研发项目“研究面向司法行政业务资源整合的数据交互管控技术”(2022YFC3302904)
国家自然科学基金专项项目“面向重大项目的研究成果匹配度智能评估研究”(L2324129)
中央高校基本科研业务费专项资金资助项目(2042023kf0220)。
关键词
科学图表理解
视觉注意力模拟
多模态大模型
交互式分割
流程图解析
scientific diagram understanding
visual attention simulation
multimodal large models
interactive segmentation
flowchart parsing