In protein engineering,while computational models are increasingly used to predict mutation effects,their evaluations primarily rely on high-throughput deep mutational scanning(DMS)experiments that use surrogate reado...In protein engineering,while computational models are increasingly used to predict mutation effects,their evaluations primarily rely on high-throughput deep mutational scanning(DMS)experiments that use surrogate readouts,which may not adequately capture the complex biochemical properties of interest.Many proteins and their functions cannot be assessed through high-throughput methods due to technical limitations or the nature of the desired properties,and this is particularly true for the real industrial application scenario.Therefore,the desired testing datasets,will be small-size(∼10–100)experimental data for each protein,and involve as many proteins as possible and as many properties as possible,which is,however,lacking.Here,we present VenusMutHub,a comprehensive benchmark study using 905 small-scale experimental datasets curated from published literature and public databases,spanning 527 proteins across diverse functional properties including stability,activity,binding affinity,and selectivity.These datasets feature direct biochemical measurements rather than surrogate readouts,providing a more rigorous assessment of model performance in predicting mutations that affect specific molecular functions.We evaluate 23 computational models across various methodological paradigms,such as sequence-based,structure-informed and evolutionary approaches.This benchmark provides practical guidance for selecting appropriate prediction methods in protein engineering applications where accurate prediction of specific functional properties is crucial.展开更多
基金supported by Science and Technology Innovation Key R&D Program of Chongqing(CSTB2024TIAD-STX0032,China)the Computational Biology Key Program of Shanghai Science and Technology Commission(23JS1400600,China)+3 种基金Shanghai Jiao Tong University Scientific and Technological Innovation Funds(21X010200843,China)and Science and Technology Innovation Key R&D Program of Chongqing(CSTB2022TIAD-STX0017,China)the Postdoctoral Fellowship Program of CPSF under Grant Number GZC20241010the Student Innovation Center at Shanghai Jiao Tong University,and Shanghai Artificial Intelligence Laboratory.
文摘In protein engineering,while computational models are increasingly used to predict mutation effects,their evaluations primarily rely on high-throughput deep mutational scanning(DMS)experiments that use surrogate readouts,which may not adequately capture the complex biochemical properties of interest.Many proteins and their functions cannot be assessed through high-throughput methods due to technical limitations or the nature of the desired properties,and this is particularly true for the real industrial application scenario.Therefore,the desired testing datasets,will be small-size(∼10–100)experimental data for each protein,and involve as many proteins as possible and as many properties as possible,which is,however,lacking.Here,we present VenusMutHub,a comprehensive benchmark study using 905 small-scale experimental datasets curated from published literature and public databases,spanning 527 proteins across diverse functional properties including stability,activity,binding affinity,and selectivity.These datasets feature direct biochemical measurements rather than surrogate readouts,providing a more rigorous assessment of model performance in predicting mutations that affect specific molecular functions.We evaluate 23 computational models across various methodological paradigms,such as sequence-based,structure-informed and evolutionary approaches.This benchmark provides practical guidance for selecting appropriate prediction methods in protein engineering applications where accurate prediction of specific functional properties is crucial.