Advances in machine learning have transformed materials discovery,yet challenges remain due to the lack of informatics-ready data and the complexity of numerical descriptors.Scientific knowledge is scattered across pu...Advances in machine learning have transformed materials discovery,yet challenges remain due to the lack of informatics-ready data and the complexity of numerical descriptors.Scientific knowledge is scattered across publications,making comprehensive data extraction difficult.This study presents a large language model(LLM)-driven framework to accelerate organic solar cell(OSC)materials discovery by extracting structured data from literature and predicting device performance using natural language embeddings.Trained on a curated dataset of 422 OSC devices,the fine-tuned LLM demonstrated strong predictive accuracy across key performance metrics:power conversion efficiency(PCE,R^(2):0.87),short-circuit current(JSC,R^(2):0.82),open-circuit voltage(VOC,R^(2):0.89),and fill factor(FF,R^(2):0.59).The models are then used to explore the space of 1.4 million combinations of materials,experimental variables and device architectures.The analysis provides data-driven design guidelines,identifying optimal donor-acceptor combinations and processing conditions that consistently yield higher device performance.展开更多
基金supported by the Office of Naval Research through grants N00014-19-1-2103 and N00014-20-1-2175.
文摘Advances in machine learning have transformed materials discovery,yet challenges remain due to the lack of informatics-ready data and the complexity of numerical descriptors.Scientific knowledge is scattered across publications,making comprehensive data extraction difficult.This study presents a large language model(LLM)-driven framework to accelerate organic solar cell(OSC)materials discovery by extracting structured data from literature and predicting device performance using natural language embeddings.Trained on a curated dataset of 422 OSC devices,the fine-tuned LLM demonstrated strong predictive accuracy across key performance metrics:power conversion efficiency(PCE,R^(2):0.87),short-circuit current(JSC,R^(2):0.82),open-circuit voltage(VOC,R^(2):0.89),and fill factor(FF,R^(2):0.59).The models are then used to explore the space of 1.4 million combinations of materials,experimental variables and device architectures.The analysis provides data-driven design guidelines,identifying optimal donor-acceptor combinations and processing conditions that consistently yield higher device performance.