Briefings in Bioinformatics|单卵双生子全基因组变异检测流程系统评估全基因组测序(whole-genome sequencing,WGS)是实现精准个体识别和遗传诊断的关键技术。然而,不同算法在检测小变异(如SNVs/INDELs)方面的灵敏度、特异性和计算资...Briefings in Bioinformatics|单卵双生子全基因组变异检测流程系统评估全基因组测序(whole-genome sequencing,WGS)是实现精准个体识别和遗传诊断的关键技术。然而,不同算法在检测小变异(如SNVs/INDELs)方面的灵敏度、特异性和计算资源需求存在显著差异,传统基准测试往往基于模拟数据或非极端样本,无法充分模拟真实世界中的挑战,如司法场景中区分高度相似的个体,或医学中追踪罕见突变。展开更多
Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.Ho...Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.展开更多
While methodology for determining the mode of evolution in coding sequences has been well established,evaluation of adaptation events in emerging types of phenotype data needs further development.Here,we propose an an...While methodology for determining the mode of evolution in coding sequences has been well established,evaluation of adaptation events in emerging types of phenotype data needs further development.Here,we propose an analysis framework(expression variance decomposition,EVaDe)for comparative single-cell expression data based on phenotypic evolution theory.After decomposing the gene expression variance into separate components,we use two strategies to identify genes exhibiting large between-taxon expression divergence and small within-cell-type expression noise in certain cell types,attributing this pattern to putative adaptive evolution.In a dataset of primate prefrontal cortex,we find that such humanspecific key genes enrich with neurodevelopment-related functions,while most other genes exhibit neutral evolution patterns.Specific neuron types are found to harbor more of these key genes than other cell types,thus likely to have experienced more extensive adaptation.Reassuringly,at the molecular sequence level,the key genes are significantly associated with the rapidly evolving conserved non-coding elements.An additional case analysis comparing the naked mole-rat(NMR)with the mouse suggests that innateimmunity-related genes and cell types have undergone putative expression adaptation in NMR.Overall,the EVaDe framework may effectively probe adaptive evolution mode in single-cell expression data.展开更多
The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects acc...The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.展开更多
To meet the need for cultivating application-oriented talents in local universities,this study introduced a project-based learning approach into the reform of bioinformatics experimental teaching.The course was struct...To meet the need for cultivating application-oriented talents in local universities,this study introduced a project-based learning approach into the reform of bioinformatics experimental teaching.The course was structured around a project titled"Influenza Virus Analysis",comprising four progressive modules:database utilization and information retrieval,sequence alignment and phylogenetic analysis,functional and structural prediction,and omics data analysis.These modules were integrated into a coherent research workflow that connected fragmented knowledge and technical skills.During implementation,flipped classroom and group collaboration methods were employed,alongside the establishment of a diversified assessment system emphasizing process evaluation.Teaching practice indicates that the reform effectively enhances students professional application skills,learning experience,and scientific literacy,facilitating a shift from"tool operation"to"problem-solving"capabilities.This study provides a reference model for the reform of bioinformatics experimental teaching in local universities.展开更多
文摘Briefings in Bioinformatics|单卵双生子全基因组变异检测流程系统评估全基因组测序(whole-genome sequencing,WGS)是实现精准个体识别和遗传诊断的关键技术。然而,不同算法在检测小变异(如SNVs/INDELs)方面的灵敏度、特异性和计算资源需求存在显著差异,传统基准测试往往基于模拟数据或非极端样本,无法充分模拟真实世界中的挑战,如司法场景中区分高度相似的个体,或医学中追踪罕见突变。
基金supported by the National Key Research and Development Program of China(2022YFA0912100)the National Natural Science Foundation of China(32270098 and 32470073)+1 种基金the Fundamental Research Funds for the Central Universities(2662024JC015)the National Key Laboratory of Agricultural Microbiology(AML2024D02)to Z.Z.
文摘Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.
文摘While methodology for determining the mode of evolution in coding sequences has been well established,evaluation of adaptation events in emerging types of phenotype data needs further development.Here,we propose an analysis framework(expression variance decomposition,EVaDe)for comparative single-cell expression data based on phenotypic evolution theory.After decomposing the gene expression variance into separate components,we use two strategies to identify genes exhibiting large between-taxon expression divergence and small within-cell-type expression noise in certain cell types,attributing this pattern to putative adaptive evolution.In a dataset of primate prefrontal cortex,we find that such humanspecific key genes enrich with neurodevelopment-related functions,while most other genes exhibit neutral evolution patterns.Specific neuron types are found to harbor more of these key genes than other cell types,thus likely to have experienced more extensive adaptation.Reassuringly,at the molecular sequence level,the key genes are significantly associated with the rapidly evolving conserved non-coding elements.An additional case analysis comparing the naked mole-rat(NMR)with the mouse suggests that innateimmunity-related genes and cell types have undergone putative expression adaptation in NMR.Overall,the EVaDe framework may effectively probe adaptive evolution mode in single-cell expression data.
基金supported by the China Agriculture Research System of MOF and MARAthe National Natural Science Foundation of China (31872337 and 31501919)the Agricultural Science and Technology Innovation Project,China (ASTIP-IAS02)。
文摘The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.
基金Supported by Undergraduate Higher Education Teaching Quality and Reform Projects of Guangdong Province(Yuejiao Gao Han[2024]No.9,Yuejiao Gao Han[2024]No.30)Guangdong Basic and Applied Basic Research Foundation(2023A1515110973)+1 种基金Guangdong Provincial Young Innovative Talents Project of General Colleges and Universities(2023KQNCX089)Quality Engineering and Teaching Reform Projects of Zhaoqing University(zlgc202239,zlgc202207,zlgc2024005,zlgc2024038).
文摘To meet the need for cultivating application-oriented talents in local universities,this study introduced a project-based learning approach into the reform of bioinformatics experimental teaching.The course was structured around a project titled"Influenza Virus Analysis",comprising four progressive modules:database utilization and information retrieval,sequence alignment and phylogenetic analysis,functional and structural prediction,and omics data analysis.These modules were integrated into a coherent research workflow that connected fragmented knowledge and technical skills.During implementation,flipped classroom and group collaboration methods were employed,alongside the establishment of a diversified assessment system emphasizing process evaluation.Teaching practice indicates that the reform effectively enhances students professional application skills,learning experience,and scientific literacy,facilitating a shift from"tool operation"to"problem-solving"capabilities.This study provides a reference model for the reform of bioinformatics experimental teaching in local universities.