Protein N-phosphorylation is widely present in nature and participates in various biological processes.However,current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation.In this ...Protein N-phosphorylation is widely present in nature and participates in various biological processes.However,current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation.In this study,we collected 11,710 experimentally verified N-phosphosites of 7344 proteins from 39 species and subsequently constructed the database Nphos to share up-to-date information on protein N-phosphorylation.Upon these substantial data,we characterized the sequential and structural features of protein N-phosphorylation.Moreover,after comparing hundreds of learning models,we chose and optimized gradient boosting decision tree(GBDT)models to predict three types of human N-phosphorylation,achieving mean area under the receiver operating characteristic curve(AUC)values of 90.56%,91.24%,and 92.01%for pHis,pLys,and pArg,respectively.Meanwhile,we discovered 488,825 distinct N-phosphosites in the human proteome.The models were also deployed in Nphos for interactive N-phosphosite prediction.In summary,this work provides new insights and points for both flexible and focused investigations of N-phosphorylation.It will also facilitate a deeper and more systematic understanding of protein N-phosphorylation modification by providing a data and technical foundation.Nphos is freely available at http://www.bio-add.org/Nphos/and http://ppodd.org.cn/Nphos/.展开更多
基金supported by the National Key R&D Program of China(Grant No.2020YFA0608300)the Technology and Engineering Center for Space Utilization,Chinese Academy of Sciences(Grant No.YYWT-0901-EXP-16)+2 种基金the Scientific Research Grant of Ningbo University(Grant No.215-432000282)the Ningbo City Top Talent Project(Grant No.215-432094250)the National Natural Science Foundation of China(Grant Nos.22107055 and 91856126).
文摘Protein N-phosphorylation is widely present in nature and participates in various biological processes.However,current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation.In this study,we collected 11,710 experimentally verified N-phosphosites of 7344 proteins from 39 species and subsequently constructed the database Nphos to share up-to-date information on protein N-phosphorylation.Upon these substantial data,we characterized the sequential and structural features of protein N-phosphorylation.Moreover,after comparing hundreds of learning models,we chose and optimized gradient boosting decision tree(GBDT)models to predict three types of human N-phosphorylation,achieving mean area under the receiver operating characteristic curve(AUC)values of 90.56%,91.24%,and 92.01%for pHis,pLys,and pArg,respectively.Meanwhile,we discovered 488,825 distinct N-phosphosites in the human proteome.The models were also deployed in Nphos for interactive N-phosphosite prediction.In summary,this work provides new insights and points for both flexible and focused investigations of N-phosphorylation.It will also facilitate a deeper and more systematic understanding of protein N-phosphorylation modification by providing a data and technical foundation.Nphos is freely available at http://www.bio-add.org/Nphos/and http://ppodd.org.cn/Nphos/.