Recent years have witnessed the transformative impact from the integration of artificial intelligence with organic and polymer synthesis. This synergy offers innovative and intelligent solutions to a range of classic ...Recent years have witnessed the transformative impact from the integration of artificial intelligence with organic and polymer synthesis. This synergy offers innovative and intelligent solutions to a range of classic problems in synthetic chemistry. These exciting advancements include the prediction of molecular property, multi-step retrosynthetic pathway planning, elucidation of the structure-performance relationship of single-step transformation, establishment of the quantitative linkage between polymer structures and their functions, design and optimization of polymerization process, prediction of the structure and sequence of biological macromolecules, as well as automated and intelligent synthesis platforms. Chemists can now explore synthetic chemistry with unprecedented precision and efficiency, creating novel reactions, catalysts, and polymer materials under the datadriven paradigm. Despite these thrilling developments, the field of artificial intelligence(AI) synthetic chemistry is still in its infancy, facing challenges and limitations in terms of data openness, model interpretability, as well as software and hardware support. This review aims to provide an overview of the current progress, key challenges, and future development suggestions in the interdisciplinary field between AI and synthetic chemistry. It is hoped that this overview will offer readers a comprehensive understanding of this emerging field, inspiring and promoting further scientific research and development.展开更多
Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperatu...Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperature(T_(m))of proteins has become a research hotspot.Previous methods rely on amino acid composition,physicochemical properties of proteins,and the optimal growth temperature(OGT)of hosts for T_(m)prediction.However,their performance in predicting T_(m)values for thermophilic proteins(T_(m)>60℃)are generally unsatisfactory due to data scarcity.Herein,we introduce T_(m)Pred,a T_(m)prediction model for thermophilic proteins,that combines protein language model,graph convolutional network and Graphormer module.For performance evaluation,T_(m)Pred achieves a root mean square error(RMSE)of 5.48℃,a pearson correlation coefficient(P)of 0.784,and a coefficient of determination(R~2)of 0.613,representing improvements of 19%,15%,and 32%,respectively,compared to the state-of-the-art predictive models like DeepTM.Furthermore,T_(m)Pred demonstrated strong generalization capability on independent blind test datasets.Overall,T_(m)Pred provides an effective tool for the mining and modification of thermophilic proteins by leveraging deep learning.展开更多
Halide perovskites have emerged as a class of highly promising photovoltaic materials with exceptional optoelectronic properties.The bandgaps of halide perovskites,along with the energy levels of the conduction band m...Halide perovskites have emerged as a class of highly promising photovoltaic materials with exceptional optoelectronic properties.The bandgaps of halide perovskites,along with the energy levels of the conduction band minimum(CBM)and valence band maximum(VBM),play a critical role in determining light absorption,interfacial energy alignment,charge carrier dynamics and photovoltaic performance of the corresponding solar cells.Herein,we developed high-accuracy machine learning(ML)models based on state-of-the-art algorithms to predict the CBM,VBM and bandgaps of halide perovskites.We primarily focus on properties calculated using the Heyd-Scuseria-Ernzerhof(HSE)functional.Among the tested ML models,the extreme gradient boosting regression(XGB),which outperformed five other shallow ML models as well as Transformer and multilayer perceptrons models,achieved a coefficient of determination(R^(2))of 0.8298 for CBM prediction(R^(2)of 0.8481 for VBM)and a mean absolute error(MAE)of 0.1510 eV(MAE of 0.1490 eV for VBM)on the test set.For HSE-derived bandgaps,the XGB model demonstrated an R^(2)score of 0.8008 and an MAE of 0.2848 eV on the test set.In addition to HSE-derived bandgaps,we also incorporated predictions for bandgaps calculated using the Perdew-Burke-Ernzerhof(PBE)functional.For PBE-calculated bandgaps,the XGB model maintained best predictive performance,achieving an R^(2)score of 0.9316 and an MAE of 0.1018 eV on the test set.Finally,we conducted shapley additive explanations analysis based on the optimal models to identify the key features influencing energy band properties of halide perovskites.Our findings statistically revealed the dominant factors affecting bandgaps,CBM and VBM energy levels in halide materials,which aligned with previous non-ML studies.This work provides meaningful insights for the rational design of halide perovskites with tailored energy band properties.展开更多
Organic chemistry is undergoing a major paradigm shift,moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence(AI).This transformative shift is being driven by technolog...Organic chemistry is undergoing a major paradigm shift,moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence(AI).This transformative shift is being driven by technological advances,the ever-increasing demand for greater research efficiency and accuracy,and the burgeoning growth of interdisciplinary research.AI models,supported by computational power and algorithms,are drastically reshaping synthetic planning and introducing groundbreaking ways to tackle complex molecular synthesis.In addition,autonomous robotic systems are rapidly accelerating the pace of discovery by performing tedious tasks with unprecedented speed and precision.This article examines the multiple opportunities and challenges presented by this paradigm shift and explores its far-reaching implications.It provides valuable insights into the future trajectory of organic chemistry research,which is increasingly defined by the synergistic interaction of automation and AI.展开更多
Biosynthesis and biodegradation of microorganisms critically underpin the development of biotechnology,new drugs and therapies,and environmental remediation.However,most uncultured microbial species along with their m...Biosynthesis and biodegradation of microorganisms critically underpin the development of biotechnology,new drugs and therapies,and environmental remediation.However,most uncultured microbial species along with their metabolic capacities in extreme environments,remain obscured.Here we unravel the metabolic potential of microbial dark matters(MDMs)in four deep-inland hypersaline lakes in Xinjiang,China.Utilizing metagenomic binning,we uncovered a rich diversity of 3030 metagenomeassembled genomes(MAGs)across 82 phyla,revealing a substantial portion,2363 MAGs,as previously unclassified at the genus level.These unknown MAGs displayed unique distribution patterns across different lakes,indicating a strong correlation with varied physicochemical conditions.Our analysis revealed an extensive array of 9635 biosynthesis gene clusters(BGCs),with a remarkable 9403 being novel,suggesting untapped biotechnological potential.Notably,some MAGs from potentially new phyla exhibited a high density of these BGCs.Beyond biosynthesis,our study also identified novel biodegradation pathways,including dehalogenation,anaerobic ammonium oxidation(Anammox),and degradation of polycyclic aromatic hydrocarbons(PAHs)and plastics,in previously unknown microbial clades.These findings significantly enrich our understanding of biosynthesis and biodegradation processes and open new avenues for biotechnological innovation,emphasizing the untapped potential of microbial diversity in hypersaline environments.展开更多
基金supported by the National Natural Science Foundation of China (22393890, You SL22393891 and 22031006,Luo S+16 种基金2203300, Pei J22371052, Chen M21991132, 21925102,92056118, and 22331003, Zhang WB22331002 and 22125101, Lu H22071004, Mo F22393892 and 22071249, Liao K22122109 and22271253, Hong X)the National Key R&D Program of China(2023YFF1205103, Pei J2020YFA0908100 and 2023YFF1204401, Zhang WB2022YFA1504301, Hong X)Zhejiang Provincial Natural Science Foundation of China (LDQ23B020002, Hong X)the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (SNZJU-SIAS-006, Hong X)the CAS Youth Interdisciplinary Team (JCTD-2021-11, Hong X)Shenzhen Medical Research Fund (B2302037, Zhang WB)Beijing National Laboratory for Molecular Sciences (BNLMSCXXM-202006, Zhang WB)the State Key Laboratory of Molecular Engineering of Polymers (Chen M)Haihe Laboratory of Sustainable Chemical Transformations and National Science&Technology Fundamental Resource Investigation Program of China (2023YFA1500008, Luo S)。
文摘Recent years have witnessed the transformative impact from the integration of artificial intelligence with organic and polymer synthesis. This synergy offers innovative and intelligent solutions to a range of classic problems in synthetic chemistry. These exciting advancements include the prediction of molecular property, multi-step retrosynthetic pathway planning, elucidation of the structure-performance relationship of single-step transformation, establishment of the quantitative linkage between polymer structures and their functions, design and optimization of polymerization process, prediction of the structure and sequence of biological macromolecules, as well as automated and intelligent synthesis platforms. Chemists can now explore synthetic chemistry with unprecedented precision and efficiency, creating novel reactions, catalysts, and polymer materials under the datadriven paradigm. Despite these thrilling developments, the field of artificial intelligence(AI) synthetic chemistry is still in its infancy, facing challenges and limitations in terms of data openness, model interpretability, as well as software and hardware support. This review aims to provide an overview of the current progress, key challenges, and future development suggestions in the interdisciplinary field between AI and synthetic chemistry. It is hoped that this overview will offer readers a comprehensive understanding of this emerging field, inspiring and promoting further scientific research and development.
基金financially supported by the National Key R&D Program of China(Nos.2020YFA0908100 and 2023YFF1204401)Shenzhen Medical Research Fund(No.B2302037)+1 种基金the National Natural Science Foundation of China(Nos.22331003 and 21925102)Beijing National Laboratory for Molecular Sciences(No.BNLMS-CXXM-202006)。
文摘Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperature(T_(m))of proteins has become a research hotspot.Previous methods rely on amino acid composition,physicochemical properties of proteins,and the optimal growth temperature(OGT)of hosts for T_(m)prediction.However,their performance in predicting T_(m)values for thermophilic proteins(T_(m)>60℃)are generally unsatisfactory due to data scarcity.Herein,we introduce T_(m)Pred,a T_(m)prediction model for thermophilic proteins,that combines protein language model,graph convolutional network and Graphormer module.For performance evaluation,T_(m)Pred achieves a root mean square error(RMSE)of 5.48℃,a pearson correlation coefficient(P)of 0.784,and a coefficient of determination(R~2)of 0.613,representing improvements of 19%,15%,and 32%,respectively,compared to the state-of-the-art predictive models like DeepTM.Furthermore,T_(m)Pred demonstrated strong generalization capability on independent blind test datasets.Overall,T_(m)Pred provides an effective tool for the mining and modification of thermophilic proteins by leveraging deep learning.
基金supported by the AI for Science(AI4S)-Preferred Program(Peking University,Shenzhen,China)the National Natural Science Foundation of China(Nos.52173153,12074011,61935016 and 12174013)the National Key Research and Development Program of China(Nos.2022YFB4200503 and 2022YFB3606502).
文摘Halide perovskites have emerged as a class of highly promising photovoltaic materials with exceptional optoelectronic properties.The bandgaps of halide perovskites,along with the energy levels of the conduction band minimum(CBM)and valence band maximum(VBM),play a critical role in determining light absorption,interfacial energy alignment,charge carrier dynamics and photovoltaic performance of the corresponding solar cells.Herein,we developed high-accuracy machine learning(ML)models based on state-of-the-art algorithms to predict the CBM,VBM and bandgaps of halide perovskites.We primarily focus on properties calculated using the Heyd-Scuseria-Ernzerhof(HSE)functional.Among the tested ML models,the extreme gradient boosting regression(XGB),which outperformed five other shallow ML models as well as Transformer and multilayer perceptrons models,achieved a coefficient of determination(R^(2))of 0.8298 for CBM prediction(R^(2)of 0.8481 for VBM)and a mean absolute error(MAE)of 0.1510 eV(MAE of 0.1490 eV for VBM)on the test set.For HSE-derived bandgaps,the XGB model demonstrated an R^(2)score of 0.8008 and an MAE of 0.2848 eV on the test set.In addition to HSE-derived bandgaps,we also incorporated predictions for bandgaps calculated using the Perdew-Burke-Ernzerhof(PBE)functional.For PBE-calculated bandgaps,the XGB model maintained best predictive performance,achieving an R^(2)score of 0.9316 and an MAE of 0.1018 eV on the test set.Finally,we conducted shapley additive explanations analysis based on the optimal models to identify the key features influencing energy band properties of halide perovskites.Our findings statistically revealed the dominant factors affecting bandgaps,CBM and VBM energy levels in halide materials,which aligned with previous non-ML studies.This work provides meaningful insights for the rational design of halide perovskites with tailored energy band properties.
基金supported by the National Natural Science Foundation of China(22071004,21933001 and 22150013)
文摘Organic chemistry is undergoing a major paradigm shift,moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence(AI).This transformative shift is being driven by technological advances,the ever-increasing demand for greater research efficiency and accuracy,and the burgeoning growth of interdisciplinary research.AI models,supported by computational power and algorithms,are drastically reshaping synthetic planning and introducing groundbreaking ways to tackle complex molecular synthesis.In addition,autonomous robotic systems are rapidly accelerating the pace of discovery by performing tedious tasks with unprecedented speed and precision.This article examines the multiple opportunities and challenges presented by this paradigm shift and explores its far-reaching implications.It provides valuable insights into the future trajectory of organic chemistry research,which is increasingly defined by the synergistic interaction of automation and AI.
基金supported by the National Key Research and Development Program of China(2021YFA1301300)Nature Science Foundation of China(62202014 and 61972217)+1 种基金Shenzhen Basic Research Programs(JCYJ20190808183205731,JCYJ20220812103301001,and JCYJ20220813151736001)Science and Technology Planning Project of Shenzhen Municipality(JCYJ20200109120416654)。
文摘Biosynthesis and biodegradation of microorganisms critically underpin the development of biotechnology,new drugs and therapies,and environmental remediation.However,most uncultured microbial species along with their metabolic capacities in extreme environments,remain obscured.Here we unravel the metabolic potential of microbial dark matters(MDMs)in four deep-inland hypersaline lakes in Xinjiang,China.Utilizing metagenomic binning,we uncovered a rich diversity of 3030 metagenomeassembled genomes(MAGs)across 82 phyla,revealing a substantial portion,2363 MAGs,as previously unclassified at the genus level.These unknown MAGs displayed unique distribution patterns across different lakes,indicating a strong correlation with varied physicochemical conditions.Our analysis revealed an extensive array of 9635 biosynthesis gene clusters(BGCs),with a remarkable 9403 being novel,suggesting untapped biotechnological potential.Notably,some MAGs from potentially new phyla exhibited a high density of these BGCs.Beyond biosynthesis,our study also identified novel biodegradation pathways,including dehalogenation,anaerobic ammonium oxidation(Anammox),and degradation of polycyclic aromatic hydrocarbons(PAHs)and plastics,in previously unknown microbial clades.These findings significantly enrich our understanding of biosynthesis and biodegradation processes and open new avenues for biotechnological innovation,emphasizing the untapped potential of microbial diversity in hypersaline environments.