期刊文献+

基于大语言模型与文本嵌入计算的中医证素辨证自动化方法研究

Automated syndrome element differentiation in traditional Chinese medicine based on large language models and text embedding computation
原文传递
导出
摘要 目的本研究旨在开发一种自动化的中医证素辨证方法。方法首先,基于领域知识、同义术语、辨证论治及中医医案标签4种不同中医药相关任务数据集,构建并训练指令调优中医多任务文本嵌入模型(Instr-MT-TCM)。其次,组织5位具有硕士以上学历的中医诊断学专家从真实世界中医药病例数据集进行数据筛选并标注症状体征,用以评估Instr-MT-TCM与大语言模型(LLM)协同方法和人工标注结果在证素辨证任务中的F1分数。最后,为验证该方法在真实临床环境下的可行性,将其应用于48例真实世界前列腺癌病例的证素积分计算。结果Instr-MT-TCM模型在训练初期表现出快速性能提升,其前1召回率(R@1)为0.848。专家筛选出1793例真实世界中医药病例,涵盖临床常见的34种疾病与66种证型。在证素辨证任务中,LLM与Instr-MT-TCM协同方法的平均F1分数为0.927,明显优于人工标注的0.512。在48例前列腺癌病例的证素分析中,病性证素以火(热)和阴虚为主,病位证素以膀胱和肾为主。结论本研究创新性提出并验证了一种基于LLM与Instr-MT-TCM协同的中医证素自动化辨证新方法。该方法在真实世界数据上取得高F1分数(0.927),展现出高度准确性与强大的泛化能力,并在前列腺癌的证素分析中显示出良好的临床应用潜力,为中医智能化证素辨证提供有效的技术支持和新的研究方向。 Objective This study aimed to develop an automated method for syndrome element differentiation in Traditional Chinese Medicine(TCM).Methods We first constructed and trained an Instructiontuned Multi-Task TCM text embedding model(Instr-MT-TCM)using four distinct TCM task datasets,including domain knowledge,synonymous terminology,syndrome differentiation and treatment,and TCM case labels.Subsequently,five TCM diagnostics experts holding master′s degrees or higher were organized to screen a real-world TCM case dataset and annotate symptoms and signs.The purpose was to evaluate the F1-score of the proposed method—the combination of Instr-MT-TCM and a Large Language Model(LLM)—by comparing its performance against the manual annotation result on the syndrome element differentiation task.Finally,to validate its feasibility in real-world clinical settings,the method was applied to 48 prostate cancer cases to calculate the syndrome element scores.Results The Instr-MT-TCM model showed rapid performance improvement in its early training phase,achieving a Recall@1(R@1)of 0.848.Experts curated a dataset of 1,793 real-world clinical cases,covering 34 common diseases and 66 syndrome patterns.In the syndrome element differentiation task,the collaborative framework of LLM and Instr-MT-TCM achieved a mean F1-score of 0.927,outperforming the 0.512 from manual annotation.The syndrome element analysis revealed that the predominant elements of disease nature were fire(heat)and yin deficiency,while the main elements of disease location were bladder and kidney.Conclusion This study proposes and validates a novel method for automated TCM syndrome element differentiation based on the synergy between LLM and our custom Instr-MT-TCM model.Achieving a high F1-score(0.927)on real-world data,the method demonstrates excellent accuracy and generalization ability.Its application in prostate cancer analysis highlights its significant clinical potential,offering effective technical support,and a new research direction for intelligent TCM syndrome element differentiation.
作者 孙肇阳 汪洋 马铭泽 陈妍文 吕镇秀 江甜甜 温慧玲 陈波 关静 SUN Zhaoyang;WANG Yang;MA Mingze;CHEN Yanwen;LYU Zhenxiu;JIANG Tiantian;WEN Huiling;CHEN Bo;GUAN Jing(School of Traditional Chinese Medicine,Beijing University of Chinese Medicine,Beijing 100029,China;School of Acupuncturemoxibustion and Tuina,Tianjin University of Traditional Chinese Medicine,Tianjin 301617,China;First Affiliated Hospital of Tianjin University of Traditional Chinese Medicine,Tianjin 300193,China;School of Traditional Chinese Medicine,Tianjin University of Traditional Chinese Medicine,Tianjin 301617,China)
出处 《北京中医药大学学报》 北大核心 2025年第8期1176-1184,共9页 Journal of Beijing University of Traditional Chinese Medicine
基金 国家重点研发计划(No.2017YFC1703302)。
关键词 证素辨证 大语言模型 文本嵌入 syndrome element differentiation large language model text embedding
  • 相关文献

参考文献16

二级参考文献286

共引文献551

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部