Cowl-induced incident Shock Wave/Boundary Layer Interactions (SWBLI) under the influence of gradual expansion waves are frequently observed in supersonic inlets. However, the analysis and prediction of interaction len...Cowl-induced incident Shock Wave/Boundary Layer Interactions (SWBLI) under the influence of gradual expansion waves are frequently observed in supersonic inlets. However, the analysis and prediction of interaction lengths have not been sufficiently investigated. First, this study presents a theoretical scaling analysis and validates it through wind tunnel experiments. It conducts detailed control volume analysis of mass conservation, considering the differences between inviscid and viscous cases. Then, three models for analysing interaction length under gradual expansion waves are derived. Related experiments using schlieren photography are conducted to validate the models in a Mach 2.73 flow. The interaction scales are captured at various relative distances between the shock impingement location and the expansion regions with wedge angles ranging from 12° to 15° and expansion angles of 9°, 12°, and 15°. Three trend lines are plotted based on different expansion angles to depict the relationship between normalised interaction length and normalised interaction strength metric. In addition, the relationship between the coefficients of the trend line and the expansion angles is introduced to predict the interaction length influenced by gradual expansion waves. Finally, the estimation of normalised interaction length is derived for various coefficients within a unified form.展开更多
In order to realize the aircraft trajectory prediction,a modified interacting multiple model(M-IMM) algorithm is proposed,which is based on the performance analysis of the standard interacting multiple model(IMM) algo...In order to realize the aircraft trajectory prediction,a modified interacting multiple model(M-IMM) algorithm is proposed,which is based on the performance analysis of the standard interacting multiple model(IMM) algorithm.In the proposed M-IMM algorithm,a new likelihood function is defined for the sake of updating flight mode probabilities,in which the influences of interacting to residual's mean error are taken into account and the assumption of likelihood function being a zero mean Gaussian function is discarded.Finally,the proposed M-IMM algorithm is applied to the simulation of the aircraft trajectory prediction,and the comparative studies are conducted to existing algorithms.The simulation results indicate the proposed M-IMM algorithm can predict aircraft trajectory more quickly and accurately.展开更多
Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI pre...Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features.In this study,we proposed KG-CNNDTI,a novel knowledge graph-enhanced framework for DTI prediction,which integrates heterogeneous biological information to improve model generalizability and predictive performance.The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm,which were further enriched with contextualized sequence representations obtained from ProteinBERT.For compound representation,multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated.The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor.Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods,particularly in terms of Precision,Recall,F1-Score and area under the precision-recall curve(AUPR).Ablation analysis highlighted the substantial contribution of knowledge graph-derived features.Moreover,KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease,resulting in 40 candidate compounds.5 were supported by literature evidence,among which 3 were further validated in vitro assays.展开更多
Nowadays,virtual human(VH) is becoming a hot research topic in virtualization.VH dialogue can be categorized as an application of natural language processing(NLP) technology,since it is relational to question and answ...Nowadays,virtual human(VH) is becoming a hot research topic in virtualization.VH dialogue can be categorized as an application of natural language processing(NLP) technology,since it is relational to question and answering(QA) technologies.In order to integrate these technologies,this paper reviews some important work on VH dialogue,and predicts some research points on the view of QA technologies.展开更多
Two-dimensional energetic materials(2DEMs),characterized by their exceptional interlayer sliding properties,are recognized as exemplar of low-sensitivity energetic materials.However,the diversity of available 2DEMs is...Two-dimensional energetic materials(2DEMs),characterized by their exceptional interlayer sliding properties,are recognized as exemplar of low-sensitivity energetic materials.However,the diversity of available 2DEMs is severely constrained by the absence of efficient methods for rapidly predicting crystal packing modes from molecular structures,impeding the high-throughput rational design of such materials.In this study,we employed quantified indicators,such as hydrogen bond dimension and maximum planar separation,to quickly screen 172DEM and 16 non-2DEM crystal structures from a crystal database.They were subsequently compared and analyzed,focusing on hydrogen bond donor-acceptor combinations,skeleton features,and intermolecular interactions.Our findings suggest that theπ-πpacking interaction energy is a key determinant in the formation of layered packing modes by planar energetic molecules,with its magnitude primarily influenced by the strongest dimericπ-πinteraction(π-π2max).Consequently,we have delineated a critical threshold forπ-π2max to discern layered packing modes and formulated a theoretical model for predictingπ-π2max,grounded in molecular electrostatic potential and dipole moment analysis.The predictive efficacy of this model was substantiated through external validation on a test set comprising 31 planar energetic molecular crystals,achieving an accuracy of 84%and a recall of 75%.Furthermore,the proposed model shows superior classification predictive performance compared to typical machine learning methods,such as random forest,on the external validation samples.This contribution introduces a novel methodology for the identification of crystal packing modes in 2DEMs,potentially accelerating the design and synthesis of high-energy,low-sensitivity 2DEMs.展开更多
Ionic liquids analogues known as Deep Eutectic Solvents (DESs) are gaining a surge of interest by the scientific community, and many applications involving DESs have been realized. Moisture content is one of the imp...Ionic liquids analogues known as Deep Eutectic Solvents (DESs) are gaining a surge of interest by the scientific community, and many applications involving DESs have been realized. Moisture content is one of the important factors that affects the physical and chemical characteristics of these fluids. In this work, the effect of mixing water with three common type III DESs on their viscosity was investigated within the water tool fraction range of (0-1) and at the temperature range (298.15-353.15 K). Similar trends of viscosity variation with respect to molar composition and temperature were observed for the three studied systems, Due to the asymmetric geometry of the constituting molecules in these fluids, their viscosity could not be modeled effectively by the conventional Grunberg and Nissan model, and the Fang-He model was used to address this issue with excellent performance. All studied aqueous DES mixtures showed negative deviation in viscosity as compared to ideal mixtures, The degree of intermolecular interactions with water reaches a maximum at a composition of 30% aqueous DES solution. Reline, the most studied DES in the literature, showed the highest deviation. The informa- tion presented in this work on the viscosity of aqueous DES solutions may serve in tuning this important property for diverse industrial applications involving these novel fluids in fluid flow, chemical reactions, liquid-liquid separation and many more.展开更多
Lactobacillus delbrueckii subsp.bulgaricus(L.bulgaricus)and Streptococcus thermophilus(S.thermophilus)are commonly used starters in milk fermentation.Fermentation experiments revealed that L.bulgaricus-S.thermophilus ...Lactobacillus delbrueckii subsp.bulgaricus(L.bulgaricus)and Streptococcus thermophilus(S.thermophilus)are commonly used starters in milk fermentation.Fermentation experiments revealed that L.bulgaricus-S.thermophilus interactions(Lb St I)substantially impact dairy product quality and production.Traditional biological humidity experiments are time-consuming and labor-intensive in screening interaction combinations,an artificial intelligence-based method for screening interactive starter combinations is necessary.However,in the current research on artificial intelligence based interaction prediction in the field of bioinformatics,most successful models adopt supervised learning methods,and there is a lack of research on interaction prediction with only a small number of labeled samples.Hence,this study aimed to develop a semi-supervised learning framework for predicting Lb St I using genomic data from 362 isolates(181per species).The framework consisted of a two-part model:a co-clustering prediction model(based on the Kyoto Encyclopedia of Genes and Genomes(KEGG)dataset)and a Laplacian regularized least squares prediction model(based on K-mer analysis and gene composition of all isolates datasets).To enhance accuracy,we integrated the separate outcomes produced by each component of the two-part model to generate the ultimate Lb St I prediction results,which were verified through milk fermentation experiments.Validation through milk fermentation experiments confirmed a high precision rate of 85%(17/20;validated with 20 randomly selected combinations of expected interacting isolates).Our data suggest that the biosynthetic pathways of cysteine,riboflavin,teichoic acid,and exopolysaccharides,as well as the ATP-binding cassette transport systems,contribute to the mutualistic relationship between these starter bacteria during milk fermentation.However,this finding requires further experimental verification.The presented model and data are valuable resources for academics and industry professionals interested in screening dairy starter cultures and understanding their interactions.展开更多
Drug-drug interaction(DDI)event prediction is a challenging problem,and accurate prediction of DDI events is critical to patient health and new drug development.Recently,many machine learning-based techniques have bee...Drug-drug interaction(DDI)event prediction is a challenging problem,and accurate prediction of DDI events is critical to patient health and new drug development.Recently,many machine learning-based techniques have been proposed for predicting DDI events.However,most of the existing methods do not effectively integrate the multidimensional features of drugs and provide poor mitigation of noise to get effective feature information.To address these limitations,we propose a DDI-Transform neural network framework for DDI event prediction.In DDI-Transform,we design a drug structure information feature extraction module and a drug bind-protein feature extraction module to obtain multidimensional feature information.A stack of DDI-Transform layers in the DDI-Transform network module are then used for adaptive learning,thus adaptively selecting the effective feature information for prediction.The results show that DDI-Transform can accurately predict DDI events and outperform the state-of-the-art models.Results on different scale datasets confirm the robustness of the method.展开更多
Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for ...Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for CPI prediction,offering notable advantages in cost-effectiveness and efficiency.This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models,highlighting their performance and achievements.It also offers insights into CPI prediction-related datasets and evaluation benchmarks.Lastly,the article presents a comprehensive assessment of the current landscape of CPI prediction,elucidating the challenges faced and outlining emerging trends to advance the field.展开更多
Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investig...Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.展开更多
Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays ...Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable,thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making.This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations.The research methodology begins with the compilation of small molecule databases,followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms,capturing patterns and salient features across extensive chemical spaces.The study then examines various drug discovery downstream tasks,including drug-target interaction(DTI)prediction,drug-target affinity(DTA)prediction,drug property(DP)prediction,and drug generation,all based on learned representations.The analysis concludes by highlighting challenges and opportunities associated with machine learning(ML)methods for molecular representation and improving downstream task performance.Additionally,the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine(TCM)medicinal substances and facilitating TCM target discovery.展开更多
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw...Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.展开更多
In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false pred...In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root- Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.展开更多
基金co-supported by the National Natural Science Foundation of China (No. 12172175)the National Science and Technology Major Project, China (No. J2019-II0014-0035)the Science Center for Gas Turbine Project, China (Nos. P2022-C-II-002-001, P2022-A-II-002-001)
文摘Cowl-induced incident Shock Wave/Boundary Layer Interactions (SWBLI) under the influence of gradual expansion waves are frequently observed in supersonic inlets. However, the analysis and prediction of interaction lengths have not been sufficiently investigated. First, this study presents a theoretical scaling analysis and validates it through wind tunnel experiments. It conducts detailed control volume analysis of mass conservation, considering the differences between inviscid and viscous cases. Then, three models for analysing interaction length under gradual expansion waves are derived. Related experiments using schlieren photography are conducted to validate the models in a Mach 2.73 flow. The interaction scales are captured at various relative distances between the shock impingement location and the expansion regions with wedge angles ranging from 12° to 15° and expansion angles of 9°, 12°, and 15°. Three trend lines are plotted based on different expansion angles to depict the relationship between normalised interaction length and normalised interaction strength metric. In addition, the relationship between the coefficients of the trend line and the expansion angles is introduced to predict the interaction length influenced by gradual expansion waves. Finally, the estimation of normalised interaction length is derived for various coefficients within a unified form.
基金National Natural Science Foundation of China(No.71401072)Natural Science Foundation of Jiangsu Province,China(No.BK20130814)Fundamental Research Funds for the Central Universities,China(No.NS2013064)
文摘In order to realize the aircraft trajectory prediction,a modified interacting multiple model(M-IMM) algorithm is proposed,which is based on the performance analysis of the standard interacting multiple model(IMM) algorithm.In the proposed M-IMM algorithm,a new likelihood function is defined for the sake of updating flight mode probabilities,in which the influences of interacting to residual's mean error are taken into account and the assumption of likelihood function being a zero mean Gaussian function is discarded.Finally,the proposed M-IMM algorithm is applied to the simulation of the aircraft trajectory prediction,and the comparative studies are conducted to existing algorithms.The simulation results indicate the proposed M-IMM algorithm can predict aircraft trajectory more quickly and accurately.
基金supported by the National Natural Science Foundation of China(Nos.82173746 and U23A20530)Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism(Shanghai Municipal Education Commission)。
文摘Accurate prediction of drug-target interactions(DTIs)plays a pivotal role in drug discovery,facilitating optimization of lead compounds,drug repurposing and elucidation of drug side effects.However,traditional DTI prediction methods are often limited by incomplete biological data and insufficient representation of protein features.In this study,we proposed KG-CNNDTI,a novel knowledge graph-enhanced framework for DTI prediction,which integrates heterogeneous biological information to improve model generalizability and predictive performance.The proposed model utilized protein embeddings derived from a biomedical knowledge graph via the Node2Vec algorithm,which were further enriched with contextualized sequence representations obtained from ProteinBERT.For compound representation,multiple molecular fingerprint schemes alongside the Uni-Mol pre-trained model were evaluated.The fused representations served as inputs to both classical machine learning models and a convolutional neural network-based predictor.Experimental evaluations across benchmark datasets demonstrated that KG-CNNDTI achieved superior performance compared to state-of-the-art methods,particularly in terms of Precision,Recall,F1-Score and area under the precision-recall curve(AUPR).Ablation analysis highlighted the substantial contribution of knowledge graph-derived features.Moreover,KG-CNNDTI was employed for virtual screening of natural products against Alzheimer's disease,resulting in 40 candidate compounds.5 were supported by literature evidence,among which 3 were further validated in vitro assays.
基金National Nature Science Foundations of China(Nos.61170027,61202169,and 61301140)Tianjin"131"Creative Talents Training Project,China(the 3rd level)
文摘Nowadays,virtual human(VH) is becoming a hot research topic in virtualization.VH dialogue can be categorized as an application of natural language processing(NLP) technology,since it is relational to question and answering(QA) technologies.In order to integrate these technologies,this paper reviews some important work on VH dialogue,and predicts some research points on the view of QA technologies.
基金support from National Natural Science Foundation of China(Grant Nos.22275145,22305189and 21875184)Natural Science Foundation of Shaanxi Province(Grant Nos.2022JC-10 and 2024JC-YBQN-0112).
文摘Two-dimensional energetic materials(2DEMs),characterized by their exceptional interlayer sliding properties,are recognized as exemplar of low-sensitivity energetic materials.However,the diversity of available 2DEMs is severely constrained by the absence of efficient methods for rapidly predicting crystal packing modes from molecular structures,impeding the high-throughput rational design of such materials.In this study,we employed quantified indicators,such as hydrogen bond dimension and maximum planar separation,to quickly screen 172DEM and 16 non-2DEM crystal structures from a crystal database.They were subsequently compared and analyzed,focusing on hydrogen bond donor-acceptor combinations,skeleton features,and intermolecular interactions.Our findings suggest that theπ-πpacking interaction energy is a key determinant in the formation of layered packing modes by planar energetic molecules,with its magnitude primarily influenced by the strongest dimericπ-πinteraction(π-π2max).Consequently,we have delineated a critical threshold forπ-π2max to discern layered packing modes and formulated a theoretical model for predictingπ-π2max,grounded in molecular electrostatic potential and dipole moment analysis.The predictive efficacy of this model was substantiated through external validation on a test set comprising 31 planar energetic molecular crystals,achieving an accuracy of 84%and a recall of 75%.Furthermore,the proposed model shows superior classification predictive performance compared to typical machine learning methods,such as random forest,on the external validation samples.This contribution introduces a novel methodology for the identification of crystal packing modes in 2DEMs,potentially accelerating the design and synthesis of high-energy,low-sensitivity 2DEMs.
文摘Ionic liquids analogues known as Deep Eutectic Solvents (DESs) are gaining a surge of interest by the scientific community, and many applications involving DESs have been realized. Moisture content is one of the important factors that affects the physical and chemical characteristics of these fluids. In this work, the effect of mixing water with three common type III DESs on their viscosity was investigated within the water tool fraction range of (0-1) and at the temperature range (298.15-353.15 K). Similar trends of viscosity variation with respect to molar composition and temperature were observed for the three studied systems, Due to the asymmetric geometry of the constituting molecules in these fluids, their viscosity could not be modeled effectively by the conventional Grunberg and Nissan model, and the Fang-He model was used to address this issue with excellent performance. All studied aqueous DES mixtures showed negative deviation in viscosity as compared to ideal mixtures, The degree of intermolecular interactions with water reaches a maximum at a composition of 30% aqueous DES solution. Reline, the most studied DES in the literature, showed the highest deviation. The informa- tion presented in this work on the viscosity of aqueous DES solutions may serve in tuning this important property for diverse industrial applications involving these novel fluids in fluid flow, chemical reactions, liquid-liquid separation and many more.
基金supported by the National Key Research and Development Program of China(2022YFD2100700)the National Natural Science Foundation of China(32325040)+4 种基金Basic Scientific Research Business Fee Project of Universities Directly(BR22-14-01)the National Dairy Science and Technology Innovation Center(2022-Open Subject-6)Inner Mongolia Natural Science Foundation Project(2021MS06023)Inner Mongolia Science&Technology planning project(2022YFSJ0017)the earmarked fund for China Agricultural Research System(CARS36)。
文摘Lactobacillus delbrueckii subsp.bulgaricus(L.bulgaricus)and Streptococcus thermophilus(S.thermophilus)are commonly used starters in milk fermentation.Fermentation experiments revealed that L.bulgaricus-S.thermophilus interactions(Lb St I)substantially impact dairy product quality and production.Traditional biological humidity experiments are time-consuming and labor-intensive in screening interaction combinations,an artificial intelligence-based method for screening interactive starter combinations is necessary.However,in the current research on artificial intelligence based interaction prediction in the field of bioinformatics,most successful models adopt supervised learning methods,and there is a lack of research on interaction prediction with only a small number of labeled samples.Hence,this study aimed to develop a semi-supervised learning framework for predicting Lb St I using genomic data from 362 isolates(181per species).The framework consisted of a two-part model:a co-clustering prediction model(based on the Kyoto Encyclopedia of Genes and Genomes(KEGG)dataset)and a Laplacian regularized least squares prediction model(based on K-mer analysis and gene composition of all isolates datasets).To enhance accuracy,we integrated the separate outcomes produced by each component of the two-part model to generate the ultimate Lb St I prediction results,which were verified through milk fermentation experiments.Validation through milk fermentation experiments confirmed a high precision rate of 85%(17/20;validated with 20 randomly selected combinations of expected interacting isolates).Our data suggest that the biosynthetic pathways of cysteine,riboflavin,teichoic acid,and exopolysaccharides,as well as the ATP-binding cassette transport systems,contribute to the mutualistic relationship between these starter bacteria during milk fermentation.However,this finding requires further experimental verification.The presented model and data are valuable resources for academics and industry professionals interested in screening dairy starter cultures and understanding their interactions.
文摘Drug-drug interaction(DDI)event prediction is a challenging problem,and accurate prediction of DDI events is critical to patient health and new drug development.Recently,many machine learning-based techniques have been proposed for predicting DDI events.However,most of the existing methods do not effectively integrate the multidimensional features of drugs and provide poor mitigation of noise to get effective feature information.To address these limitations,we propose a DDI-Transform neural network framework for DDI event prediction.In DDI-Transform,we design a drug structure information feature extraction module and a drug bind-protein feature extraction module to obtain multidimensional feature information.A stack of DDI-Transform layers in the DDI-Transform network module are then used for adaptive learning,thus adaptively selecting the effective feature information for prediction.The results show that DDI-Transform can accurately predict DDI events and outperform the state-of-the-art models.Results on different scale datasets confirm the robustness of the method.
基金supported by National Natural Science Foundation of China(T2225002,82273855 to M.Y.Z.,82204278 to X.T.L.)Lingang Laboratory(LG202102-01-02 to M.Y.Z.)+2 种基金National Key Research and Development Programof China(2022YFC3400504 toM.Y.Z.)SIMM-SHUTCM Traditional Chinese Medicine Innovation Joint Research Program(E2G805H to M.Y.Z.)Shanghai Municipal Science and TechnologyMajor Project and China Postdoctoral Science Foundation(2022M720153 to X.T.L.).
文摘Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for CPI prediction,offering notable advantages in cost-effectiveness and efficiency.This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models,highlighting their performance and achievements.It also offers insights into CPI prediction-related datasets and evaluation benchmarks.Lastly,the article presents a comprehensive assessment of the current landscape of CPI prediction,elucidating the challenges faced and outlining emerging trends to advance the field.
基金supported in part by the National Natural Science Foundation of China(22033001)the National Key R&D Program of China(2022YFA1303700)the Chinese Academy of Medical Sciences(2021-I2M-5-014).
文摘Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.
基金supported by the Shenzhen Key Laboratory of Intelligent Bioinformatics(No.ZDSYS20220422103800001)the Shenzhen Science and Technology Program(No.JCYJ20230807140709020)+2 种基金National Natural Science Foundation of China(Nos.62402489,U22A2041,and 62373172)the China Postdoctoral Science Foundation(No.2023M743688)Guangdong Basic and Applied Basic Research Foundation(Nos.2024A1515011960 and 2023A1515110570)。
文摘Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable,thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making.This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations.The research methodology begins with the compilation of small molecule databases,followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms,capturing patterns and salient features across extensive chemical spaces.The study then examines various drug discovery downstream tasks,including drug-target interaction(DTI)prediction,drug-target affinity(DTA)prediction,drug property(DP)prediction,and drug generation,all based on learned representations.The analysis concludes by highlighting challenges and opportunities associated with machine learning(ML)methods for molecular representation and improving downstream task performance.Additionally,the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine(TCM)medicinal substances and facilitating TCM target discovery.
基金This research was partially supported by the National Natural Science Foundation of China(No.30470916).
文摘Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
基金This work was partly supported by Molecular Therapeutics Program,Nebraska Department of Health and Human Services and by Grant CA72781 (to RKS)Cancer Center Support Grant (P30CA036727) from National Cancer Institute,National Institutes of Health,USA.
文摘In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root- Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.