Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperatu...Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperature(T_(m))of proteins has become a research hotspot.Previous methods rely on amino acid composition,physicochemical properties of proteins,and the optimal growth temperature(OGT)of hosts for T_(m)prediction.However,their performance in predicting T_(m)values for thermophilic proteins(T_(m)>60℃)are generally unsatisfactory due to data scarcity.Herein,we introduce T_(m)Pred,a T_(m)prediction model for thermophilic proteins,that combines protein language model,graph convolutional network and Graphormer module.For performance evaluation,T_(m)Pred achieves a root mean square error(RMSE)of 5.48℃,a pearson correlation coefficient(P)of 0.784,and a coefficient of determination(R~2)of 0.613,representing improvements of 19%,15%,and 32%,respectively,compared to the state-of-the-art predictive models like DeepTM.Furthermore,T_(m)Pred demonstrated strong generalization capability on independent blind test datasets.Overall,T_(m)Pred provides an effective tool for the mining and modification of thermophilic proteins by leveraging deep learning.展开更多
基金financially supported by the National Key R&D Program of China(Nos.2020YFA0908100 and 2023YFF1204401)Shenzhen Medical Research Fund(No.B2302037)+1 种基金the National Natural Science Foundation of China(Nos.22331003 and 21925102)Beijing National Laboratory for Molecular Sciences(No.BNLMS-CXXM-202006)。
文摘Thermophilic proteins maintain their structure and function at high temperatures,making them widely useful in industrial applications.Due to the complexity of experimental measurements,predicting the melting temperature(T_(m))of proteins has become a research hotspot.Previous methods rely on amino acid composition,physicochemical properties of proteins,and the optimal growth temperature(OGT)of hosts for T_(m)prediction.However,their performance in predicting T_(m)values for thermophilic proteins(T_(m)>60℃)are generally unsatisfactory due to data scarcity.Herein,we introduce T_(m)Pred,a T_(m)prediction model for thermophilic proteins,that combines protein language model,graph convolutional network and Graphormer module.For performance evaluation,T_(m)Pred achieves a root mean square error(RMSE)of 5.48℃,a pearson correlation coefficient(P)of 0.784,and a coefficient of determination(R~2)of 0.613,representing improvements of 19%,15%,and 32%,respectively,compared to the state-of-the-art predictive models like DeepTM.Furthermore,T_(m)Pred demonstrated strong generalization capability on independent blind test datasets.Overall,T_(m)Pred provides an effective tool for the mining and modification of thermophilic proteins by leveraging deep learning.