Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzhe...Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzheimer’s Disease International).The apolipoproteinε4(APOE4)allele is the strongest genetic risk factor for late-onset AD(after age 65 years).Apolipoprotein E,a lipid transporter,exists in three variants:ε2,ε3,andε4.APOEε2(APOE2)is protective against AD,APOEε3(APOE3)is neutral,while APOE4 significantly increases the risk.Individuals with one copy of APOE4 have a 4-fold greater risk of developing AD,and those with two copies face an 8-fold risk compared to non-carriers.Even in cognitively normal individuals,APOE4 carriers exhibit brain metabolic and vascular deficits decades before amyloid-beta(Aβ)plaques and neurofibrillary tau tangles emerge-the hallmark pathologies of AD(Reiman et al.,2001,2005;Thambisetty et al.,2010).Notably,studies have demonstrated reduced glucose uptake,or hypometabolism,in brain regions vulnerable to AD in asymptomatic middle-aged APOE4 carriers,long before clinical symptoms arise(Reiman et al.,2001,2005).展开更多
Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathw...Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).展开更多
Background:Identifying patient-specific flow of signal transduction perturbed by multiple single-nucleotide alterations is critical for improving patient outcomes in cancer cases.However,accurate estimation of mutatio...Background:Identifying patient-specific flow of signal transduction perturbed by multiple single-nucleotide alterations is critical for improving patient outcomes in cancer cases.However,accurate estimation of mutational effects at the pathway level for such patients remains an open problem.While probabilistic pathway topology methods are gaining interest among the scientific community,the overwhelming majority do not account for network perturbation effects from multiple single-nucleotide alterations.Methods:Here we present an improvement of the mutational forks formalism to infer the patient-specific flow of signal transduction based on multiple single-nucleotide alterations,including non-synonymous and synonymous mutations.The lung adenocarcinoma and skin cutaneous melanoma datasets from TCGA Pan-Cancer Atlas have been employed to show the utility of the proposed method.Results:We have comprehensively characterized six mutational forks.The number of mutated nodes ranged from one to four depending on the topological characteristics of a fork.Transitional confidences(TCs)have been computed for every possible combination of single-nucleotide alterations in the fork.The performed analysis demonstrated the capacity of the mutational forks formalism to follow a biologically explainable logic in the identification of high-likelihood signaling routes in lung adenocarcinoma and skin cutaneous melanoma patients.The findings have been largely supported by the evidence from the biomedical literature.Conclusion:We conclude that the formalism has a great chance to enable an assessment of patient-specific flow by leveraging information from multiple single-nucleotide alterations to adjust the transitional likelihoods that are solely based on the canonical view of a disease.展开更多
基金supported by National Institute on Aging(NIH-NIA)R01AG054459(to ALL).
文摘Alzheimer’s disease(AD)is the most common form of dementia,affecting over 50 million people worldwide.This figure is projected to nearly double every 20 years,reaching 82 million by 2030 and 152 million by 2050(Alzheimer’s Disease International).The apolipoproteinε4(APOE4)allele is the strongest genetic risk factor for late-onset AD(after age 65 years).Apolipoprotein E,a lipid transporter,exists in three variants:ε2,ε3,andε4.APOEε2(APOE2)is protective against AD,APOEε3(APOE3)is neutral,while APOE4 significantly increases the risk.Individuals with one copy of APOE4 have a 4-fold greater risk of developing AD,and those with two copies face an 8-fold risk compared to non-carriers.Even in cognitively normal individuals,APOE4 carriers exhibit brain metabolic and vascular deficits decades before amyloid-beta(Aβ)plaques and neurofibrillary tau tangles emerge-the hallmark pathologies of AD(Reiman et al.,2001,2005;Thambisetty et al.,2010).Notably,studies have demonstrated reduced glucose uptake,or hypometabolism,in brain regions vulnerable to AD in asymptomatic middle-aged APOE4 carriers,long before clinical symptoms arise(Reiman et al.,2001,2005).
基金National Institute of General Medical Sciences,Grant/Award Number:R35-GM126985National Institute of Diabetes and Digestive and Kidney Diseases,Grant/Award Number:P30DK092950U.S.National Library of Medicine,Grant/Award Number:LM013392。
文摘Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).
文摘Background:Identifying patient-specific flow of signal transduction perturbed by multiple single-nucleotide alterations is critical for improving patient outcomes in cancer cases.However,accurate estimation of mutational effects at the pathway level for such patients remains an open problem.While probabilistic pathway topology methods are gaining interest among the scientific community,the overwhelming majority do not account for network perturbation effects from multiple single-nucleotide alterations.Methods:Here we present an improvement of the mutational forks formalism to infer the patient-specific flow of signal transduction based on multiple single-nucleotide alterations,including non-synonymous and synonymous mutations.The lung adenocarcinoma and skin cutaneous melanoma datasets from TCGA Pan-Cancer Atlas have been employed to show the utility of the proposed method.Results:We have comprehensively characterized six mutational forks.The number of mutated nodes ranged from one to four depending on the topological characteristics of a fork.Transitional confidences(TCs)have been computed for every possible combination of single-nucleotide alterations in the fork.The performed analysis demonstrated the capacity of the mutational forks formalism to follow a biologically explainable logic in the identification of high-likelihood signaling routes in lung adenocarcinoma and skin cutaneous melanoma patients.The findings have been largely supported by the evidence from the biomedical literature.Conclusion:We conclude that the formalism has a great chance to enable an assessment of patient-specific flow by leveraging information from multiple single-nucleotide alterations to adjust the transitional likelihoods that are solely based on the canonical view of a disease.