Identification of disease-specific cell subtypes(DSCSs)has profound implications for understanding disease mechanisms,preoperative diagnosis,and precision therapy.However,achieving unified annotation of DSCSs in heter...Identification of disease-specific cell subtypes(DSCSs)has profound implications for understanding disease mechanisms,preoperative diagnosis,and precision therapy.However,achieving unified annotation of DSCSs in heterogeneous single-cell datasets remains a challenge.In this study,we developed the gPRINT algorithm(generalized approach for cell subtype identification with single cell's voicePRINT).Inspired by the principles of speech recognition in noisy environments,gPRINT transforms gene position and gene expression information into voiceprints based on ordered and clustered gene expression phenomena,obtaining unique“gene print”patterns for each cell.Then,we integrated neural networks to mitigate the impact of background noise on cell identity label mapping.We demonstrated the reproducibility of gPRINT across different donors,single-cell sequencing platforms,and disease subtypes,and its utility for automatic cell subtype annotation across datasets.Moreover,gPRINT achieved higher annotation accuracy of 98.37%when externally validated based on the same tissue,surpassing other algorithms.Furthermore,this approach has been applied to fibrosis-associated diseases in multiple tissues throughout the body,as well as to the annotation of fibroblast subtypes in a single tissue,tendon,where fibrosis is prevalent.We successfully achieved automatic prediction of tendinopathy-specific cell subtypes,key targets,and related drugs.In summary,gPRINT provides an automated and unified approach for identifying DSCSs across datasets,facilitating the elucidation of specific cell subtypes under different disease states and providing a powerful tool for exploring therapeutic targets in diseases.展开更多
基金supported by the National key research and development program of China(2022YFA1106800)the National Natural Science Foundation of China(Grant Nos.T2121004,32271406,82222044,82202045,82402845)+5 种基金“Leading Goose”Science and Technology Project of Zhejiang Province(2024C03207)Key R&D Program of Zhejiang(2024SSYS0026)China Postdoctoral Science Foundation(2023M743025)Fundamental Research Funds for the Zhejiang Provincial Universities(K20240141)the Postdoctoral Fellowship Program of CPSF(GZC20232297)the General Research Fund of the Research Grants Council of Hong Kong,China(24101921).
文摘Identification of disease-specific cell subtypes(DSCSs)has profound implications for understanding disease mechanisms,preoperative diagnosis,and precision therapy.However,achieving unified annotation of DSCSs in heterogeneous single-cell datasets remains a challenge.In this study,we developed the gPRINT algorithm(generalized approach for cell subtype identification with single cell's voicePRINT).Inspired by the principles of speech recognition in noisy environments,gPRINT transforms gene position and gene expression information into voiceprints based on ordered and clustered gene expression phenomena,obtaining unique“gene print”patterns for each cell.Then,we integrated neural networks to mitigate the impact of background noise on cell identity label mapping.We demonstrated the reproducibility of gPRINT across different donors,single-cell sequencing platforms,and disease subtypes,and its utility for automatic cell subtype annotation across datasets.Moreover,gPRINT achieved higher annotation accuracy of 98.37%when externally validated based on the same tissue,surpassing other algorithms.Furthermore,this approach has been applied to fibrosis-associated diseases in multiple tissues throughout the body,as well as to the annotation of fibroblast subtypes in a single tissue,tendon,where fibrosis is prevalent.We successfully achieved automatic prediction of tendinopathy-specific cell subtypes,key targets,and related drugs.In summary,gPRINT provides an automated and unified approach for identifying DSCSs across datasets,facilitating the elucidation of specific cell subtypes under different disease states and providing a powerful tool for exploring therapeutic targets in diseases.