摘要
Large language models(LLMs)have emerged as transformative tools in infectious disease research,offering unprecedented capabilities in analyzing biological sequences.This review summarizes three primary types of biological LLMs,including protein language models,genomic language models,and multimodal models,highlighting their architectures and applications.These models are revolutionizing key areas such as pathogen identification,evolutionary surveillance,host-pathogen prediction,and therapeutic development by enabling the interpretation of complex genomic and proteomic data at an unparalleled scale.While recent advancements are remarkable,challenges persist in data quality,long-context processing,model interpretability,and biosafety considerations.Understanding the potential and limitations of LLMs is crucial for leveraging them effectively in infectious disease research while ensuring responsible development and deployment.
基金
supported by the National Key R&D Program of China(2022YFF1202101)
the Major Project of Guangzhou National Laboratory(SRPG22-007 and GZNL2025C01013)
the National Natural Science Foundation of China(12371485).