Background Antifreeze proteins(AFPs)are key in combating cold in living organisms and preventing ice mor-phogenesis.These proteins have applications in cryopreservation,food preservation,and biotechnology.Factors such...Background Antifreeze proteins(AFPs)are key in combating cold in living organisms and preventing ice mor-phogenesis.These proteins have applications in cryopreservation,food preservation,and biotechnology.Factors such as accurate prediction of AFP are considered essential for advancing these fields.Methods In this study,a novel method,StackAFP,was developed using the stacking method and latent semantic analysis as the feature extraction technique for predicting antifreeze proteins.Four machine learning algorithms,random forest,XGboost,CatBoost,and LightGBM(LGBM),were used as the baseline models,and LGBM was em-ployed as the meta-classifier to develop StackAFP.StackAFP was compared with different conventional machine learning methods to ensure the robustness of the proposed method.Results StackAFP showed potentiality with an accuracy of 0.9997,a Matthews correlation coefficient,and a Kappa value of 0.9944.StackAFP outperformed the entire applied conventional machine learning model.Fur-thermore,StackAFP also outperformed the existing methods for identifying AFPs.Conclusion The performance of StackAFP demonstrated its effectiveness,highlighted its potential in bioinfor-matics,and advanced our knowledge of AFPs.展开更多
基金funded in part by the Natural Sciences and Engineering Research Council of Canada(NSERC).
文摘Background Antifreeze proteins(AFPs)are key in combating cold in living organisms and preventing ice mor-phogenesis.These proteins have applications in cryopreservation,food preservation,and biotechnology.Factors such as accurate prediction of AFP are considered essential for advancing these fields.Methods In this study,a novel method,StackAFP,was developed using the stacking method and latent semantic analysis as the feature extraction technique for predicting antifreeze proteins.Four machine learning algorithms,random forest,XGboost,CatBoost,and LightGBM(LGBM),were used as the baseline models,and LGBM was em-ployed as the meta-classifier to develop StackAFP.StackAFP was compared with different conventional machine learning methods to ensure the robustness of the proposed method.Results StackAFP showed potentiality with an accuracy of 0.9997,a Matthews correlation coefficient,and a Kappa value of 0.9944.StackAFP outperformed the entire applied conventional machine learning model.Fur-thermore,StackAFP also outperformed the existing methods for identifying AFPs.Conclusion The performance of StackAFP demonstrated its effectiveness,highlighted its potential in bioinfor-matics,and advanced our knowledge of AFPs.