Protein aggregation drives proteinopathies ranging from ALS to systemic amyloidosis,yet the multiscale determinants bridging sequence,structure,and kinetics remain elusive.We present SKALE,an interpretable machine lea...Protein aggregation drives proteinopathies ranging from ALS to systemic amyloidosis,yet the multiscale determinants bridging sequence,structure,and kinetics remain elusive.We present SKALE,an interpretable machine learning framework that integrates sequence motifs,AlphaFold-derived structural descriptors,and experimental kinetics to decode aggregation mechanisms.SKALE identifies latent hotspots that evade conventional tools and matches high-performing neural baselines while preserving computational efficiency.In ALS-linked SOD1 G86R,the model isolates a risk region at residues 72-91 where preserved β-sheet geometry coincides with weakened hydrogen bonding to drive nucleation.Similarly,analysis of TDP-43 S332N reveals that a locally unwound helix increases surface exposure,a prediction validated by showing that targeted deletion of model-identified regions significantly reduces cellular aggregation.The framework generalizes to Tau P301L and PRNP variants where it uncovers distal aggregation-prone regions to discriminate pathogenic drivers from neutral mutations.Interpretability analysis further disentangles global from mutation-local mechanisms to reveal that β-sheet propensity acts as a shared determinant while hydrogen bond dynamics define specific routes to nucleation.These findings establish SKALE as a scalable,disease-agnostic engine that combines high-fidelity prediction with biophysical resolution to decode the molecular logic of misfolding and guide therapeutic design.展开更多
基金International Brain Research Organization(IBRO)Rising Star Awardee and received an IBRO Early Career Principal Investigator Grant(No.PM010CNI000148)supported by Sunway University internal grant(No.GRTIN-IGS[02]-CVVR-11-2023)+2 种基金supported by the Fundamental Research Funds from the Central of Public Welfare Research Institute,China Rehabilitation Institutesupported by the research initiation funding scheme provided by Henan University of Technology(No.0004/31401568)Shenzhen Vaccine Biopharmaceuticals Limited(No.0004/51100292).
文摘Protein aggregation drives proteinopathies ranging from ALS to systemic amyloidosis,yet the multiscale determinants bridging sequence,structure,and kinetics remain elusive.We present SKALE,an interpretable machine learning framework that integrates sequence motifs,AlphaFold-derived structural descriptors,and experimental kinetics to decode aggregation mechanisms.SKALE identifies latent hotspots that evade conventional tools and matches high-performing neural baselines while preserving computational efficiency.In ALS-linked SOD1 G86R,the model isolates a risk region at residues 72-91 where preserved β-sheet geometry coincides with weakened hydrogen bonding to drive nucleation.Similarly,analysis of TDP-43 S332N reveals that a locally unwound helix increases surface exposure,a prediction validated by showing that targeted deletion of model-identified regions significantly reduces cellular aggregation.The framework generalizes to Tau P301L and PRNP variants where it uncovers distal aggregation-prone regions to discriminate pathogenic drivers from neutral mutations.Interpretability analysis further disentangles global from mutation-local mechanisms to reveal that β-sheet propensity acts as a shared determinant while hydrogen bond dynamics define specific routes to nucleation.These findings establish SKALE as a scalable,disease-agnostic engine that combines high-fidelity prediction with biophysical resolution to decode the molecular logic of misfolding and guide therapeutic design.