摘要
Dear Editor,Large language models(LLMs)show great promise in medical applications,but challenges like limited high-quality data,closed-source model rigidity,and reasoning degradation during fine-tuning hinder their reliability.To address this,we present Med-Aligner,a plug-in module that learns correction residuals to improve accuracy without full model re-optimization.Trained on 267,524 anonymized medical records from 21 departments,Med-Aligner was integrated with eight LLMs(e.g.,GPT-4 and Med-Llama3-8B)and evaluated on helpfulness,harmlessness,and honesty(3H).It achieved average gains of 41.3%±25.4%,30.3%±12.4%,and 27.3%±14.8%in helpfulness and 10.9%±8.6%and 16.6%±11.3%in harmlessness and a median 1.7%(range:0.4%–3.4%)improvement in honesty(p<0.05).Distribution shift plots confirmed consistent gains,especially in safety and utility.Its lightweight,model-agnostic design enables deployment on resource-limited devices like smartphones.Top rankings on the Alpaca-Eval leaderboard further validate its effectiveness.By bridging open-source and proprietary LLMs,Med-Aligner offers a flexible,efficient solution for medical AI.Limitations include reliance on offline data and the need for clinical validation.
基金
supported by the National Natural Science Foundation of China General Project(no.623B2003)
the Beijing Natural Science Foundation(no.L242135)
the National Key Research and Development Program for Government-to-Government International Scientific and Technological Cooperation(no.2024YFE0107100)
the“Research on Clinical Application of Medical Artificial Intelligence”Project of the Hospital Management Institute of the National Health Commission(nos.YLXX24AIA008 and YLXX24AIA026)
provided by the Hebei Province Higher Education Science and Technology Research Project(no.CXZX2025030)
the Hebei Provincial Government-funded Clinical Talent Project(no.ZF2025062).