Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning f...Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning framework enhanced by knowledge graphs.Methods We developed Agent-GNN,a three-stage decoupled learning framework,and validated it on the Traditional Chinese Medicine Syndrome Diagnosis(TCM-SD)dataset containing 54152 clinical records across 148 syndrome categories.First,we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system.Second,we proposed a Functional Patient Profiling(FPP)method that utilizes large language models(LLMs)combined with Graph Retrieval-Augmented Generation(RAG)to extract structured symptom-etiology-pathogenesis subgraphs from medical records.Third,we employed heterogeneous graph neural networks to learn structured combination patterns explicitly.We compared our method against multiple baselines including BERT,ZY-BERT,ZY-BERT+Know,GAT,and GPT-4 Few-shot,using macro-F1 score as the primary evaluation metric.Additionally,ablation experiments were conducted to validate the contribution of each key component to model performance.Results Agent-GNN achieved an overall macro-F1 score of 72.4%,representing an 8.7 percentage points improvement over ZY-BERT+Know(63.7%),the strongest baseline among traditional methods.For long-tail syndromes with fewer than 10 samples,Agent-GNN reached a macro-F1 score of 58.6%,compared with 39.3%for ZY-BERT+Know and 41.2%for GPT-4 Few-shot,representing relative improvements of 49.2%and 42.2%,respectively.Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.Conclusion This study proposes Agent-GNN,a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation.By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs,our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.展开更多
基金Sichuan TCM Culture Coordinated Development Research Center Project(2023XT131)National Key Science and Technology Project of China(2023ZD0509405)National Natural Science Foundation of China(82174236).
文摘Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning framework enhanced by knowledge graphs.Methods We developed Agent-GNN,a three-stage decoupled learning framework,and validated it on the Traditional Chinese Medicine Syndrome Diagnosis(TCM-SD)dataset containing 54152 clinical records across 148 syndrome categories.First,we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system.Second,we proposed a Functional Patient Profiling(FPP)method that utilizes large language models(LLMs)combined with Graph Retrieval-Augmented Generation(RAG)to extract structured symptom-etiology-pathogenesis subgraphs from medical records.Third,we employed heterogeneous graph neural networks to learn structured combination patterns explicitly.We compared our method against multiple baselines including BERT,ZY-BERT,ZY-BERT+Know,GAT,and GPT-4 Few-shot,using macro-F1 score as the primary evaluation metric.Additionally,ablation experiments were conducted to validate the contribution of each key component to model performance.Results Agent-GNN achieved an overall macro-F1 score of 72.4%,representing an 8.7 percentage points improvement over ZY-BERT+Know(63.7%),the strongest baseline among traditional methods.For long-tail syndromes with fewer than 10 samples,Agent-GNN reached a macro-F1 score of 58.6%,compared with 39.3%for ZY-BERT+Know and 41.2%for GPT-4 Few-shot,representing relative improvements of 49.2%and 42.2%,respectively.Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.Conclusion This study proposes Agent-GNN,a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation.By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs,our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.