With the rapid development of sequencing technologies,especially the maturity of third-generation sequencing technologies,there has been a significant increase in the number and quality of published genome assemblies....With the rapid development of sequencing technologies,especially the maturity of third-generation sequencing technologies,there has been a significant increase in the number and quality of published genome assemblies.The emergence of these high-quality genomes has raised higher requirements for genome evaluation.Although numerous computational methods have been developed to evaluate assembly quality from various perspectives,the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality.To address this issue,we have developed the Genome Assembly Evaluating Pipeline(GAEP),which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives,including continuity,completeness,and correctness.Additionally,GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy,which performs well in our testing.GAEP is publicly available at https://github.com/zyoptimistic/GAEP under the GPL3.0 License.With GAEP,users can quickly obtain accurate and reliable evaluation results,facilitating the comparison and selection of high-quality genome assemblies.展开更多
Alexander disease is a rare neurodegenerative disorder caused by mutations in the glial fibrillary acidic protein,a type III intermediate filament protein expressed in astrocytes.Both early(infantile or juvenile)and a...Alexander disease is a rare neurodegenerative disorder caused by mutations in the glial fibrillary acidic protein,a type III intermediate filament protein expressed in astrocytes.Both early(infantile or juvenile)and adult onsets of the disease are known and,in both cases,astrocytes present characteristic aggregates,named Rosenthal fibers.Mutations are spread along the glial fibrillary acidic protein sequence disrupting the typical filament network in a dominant manner.Although the presence of aggregates suggests a proteostasis problem of the mutant forms,this behavior is also observed when the expression of wild-type glial fibrillary acidic protein is increased.Additionally,several isoforms of glial fibrillary acidic protein have been described to date,while the impact of the mutations on their expression and proportion has not been exhaustively studied.Moreover,the posttranslational modification patterns and/or the protein-protein interaction networks of the glial fibrillary acidic protein mutants may be altered,leading to functional changes that may modify the morphology,positioning,and/or the function of several organelles,in turn,impairing astrocyte normal function and subsequently affecting neurons.In particular,mitochondrial function,redox balance and susceptibility to oxidative stress may contribute to the derangement of glial fibrillary acidic protein mutant-expressing astrocytes.To study the disease and to develop putative therapeutic strategies,several experimental models have been developed,a collection that is in constant growth.The fact that most cases of Alexander disease can be related to glial fibrillary acidic protein mutations,together with the availability of new and more relevant experimental models,holds promise for the design and assay of novel therapeutic strategies.展开更多
Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats...Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats(TRs).Using the sequence patterns of the linusorb-embedded TRs,we previously mined the flax reference genome assembly and identified>280 TRs containing linusorb-like domains distributed in 25 proteins,revealing the potential diversity of linusorbs.In this work,we studied the evolution of TRs in the 30 linusorb-related genes by first verifying the gene sequences using Sanger method.Comparison of the Sanger contigs with the reference genome assembly showed widespread discrepancies in the repeat regions.We annotated the Sanger contigs and identified eight groups of paralogous genes.Pairwise comparisons were conducted among repeats within a region,across regions within a gene and across paralogous genes.Similarity matrices revealed three distinct modes of tandem duplication that differ in the number of repeats as a duplication unit.Most of the across-paralogue repeat pairs(RPs)share similarity lower than 50%.The numbers of repeat regions and repeats also differ among most of the paralogues,suggesting that repeats diverged independently in gene paralogues.Two modes of divergence were inferred from the similarity distributions of RPs under different categories.The flanking non-repetitive regions among paralogous genes exhibited local conservation,as well as variations indicative of functional diversification.This work highlights the importance of verifying repetitive sequences in the genome assembly.Our findings about repeat duplication and divergence constitute a multi-dimensional model of repeat evolution in linusorb-related genes.展开更多
基金supported by the National Key Research and Development Project Program of China(2022YFC3400300,2019YFE0109600)the China Postdoctoral Science Foundation(2021M701584).
文摘With the rapid development of sequencing technologies,especially the maturity of third-generation sequencing technologies,there has been a significant increase in the number and quality of published genome assemblies.The emergence of these high-quality genomes has raised higher requirements for genome evaluation.Although numerous computational methods have been developed to evaluate assembly quality from various perspectives,the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality.To address this issue,we have developed the Genome Assembly Evaluating Pipeline(GAEP),which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives,including continuity,completeness,and correctness.Additionally,GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy,which performs well in our testing.GAEP is publicly available at https://github.com/zyoptimistic/GAEP under the GPL3.0 License.With GAEP,users can quickly obtain accurate and reliable evaluation results,facilitating the comparison and selection of high-quality genome assemblies.
基金Work at the authors’laboratories is supported by grants from"la Caixa"FoundationGrant Agreement LCF/PR/HR21/52410002+4 种基金EJP RD COFUND-EJP N°825575"Alexander"to DPS and MPAgencia Estatal de Investigacion,MICINN and ERDF Grant No.RTI2018-097624-B-I00 and PID2021-126827OB-I00 to DPSgrants from the Swedish Research Council(2017-02255)ALF Gothenburg(146051)The Swedish Society for Medical Research,Hj?rnfonden,S?derberg’s Foundations,Hagstr?mer’s Foundation Millennium,Ami?v’s Foundation,E.Jacobson’s Donation Fund,the Swedish Stroke Foundation,NanoNet COST Action(BM1002),EU FP 7 Program TargetBraln(279017)to MP。
文摘Alexander disease is a rare neurodegenerative disorder caused by mutations in the glial fibrillary acidic protein,a type III intermediate filament protein expressed in astrocytes.Both early(infantile or juvenile)and adult onsets of the disease are known and,in both cases,astrocytes present characteristic aggregates,named Rosenthal fibers.Mutations are spread along the glial fibrillary acidic protein sequence disrupting the typical filament network in a dominant manner.Although the presence of aggregates suggests a proteostasis problem of the mutant forms,this behavior is also observed when the expression of wild-type glial fibrillary acidic protein is increased.Additionally,several isoforms of glial fibrillary acidic protein have been described to date,while the impact of the mutations on their expression and proportion has not been exhaustively studied.Moreover,the posttranslational modification patterns and/or the protein-protein interaction networks of the glial fibrillary acidic protein mutants may be altered,leading to functional changes that may modify the morphology,positioning,and/or the function of several organelles,in turn,impairing astrocyte normal function and subsequently affecting neurons.In particular,mitochondrial function,redox balance and susceptibility to oxidative stress may contribute to the derangement of glial fibrillary acidic protein mutant-expressing astrocytes.To study the disease and to develop putative therapeutic strategies,several experimental models have been developed,a collection that is in constant growth.The fact that most cases of Alexander disease can be related to glial fibrillary acidic protein mutations,together with the availability of new and more relevant experimental models,holds promise for the design and assay of novel therapeutic strategies.
基金supported by the Agriculture Development Fund(ADF)from the Saskatchewan Ministry of Agriculture(ADF-20080205,20120099 and 20120146)Genome Canada,Total Utiliza-tion Flax Genomics(TUFGEN-1309)+1 种基金Canada Foundation for Innovation(CFI-23426)National Natural Science Foundation of China(Grant No.32401975)also provided financial support.
文摘Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats(TRs).Using the sequence patterns of the linusorb-embedded TRs,we previously mined the flax reference genome assembly and identified>280 TRs containing linusorb-like domains distributed in 25 proteins,revealing the potential diversity of linusorbs.In this work,we studied the evolution of TRs in the 30 linusorb-related genes by first verifying the gene sequences using Sanger method.Comparison of the Sanger contigs with the reference genome assembly showed widespread discrepancies in the repeat regions.We annotated the Sanger contigs and identified eight groups of paralogous genes.Pairwise comparisons were conducted among repeats within a region,across regions within a gene and across paralogous genes.Similarity matrices revealed three distinct modes of tandem duplication that differ in the number of repeats as a duplication unit.Most of the across-paralogue repeat pairs(RPs)share similarity lower than 50%.The numbers of repeat regions and repeats also differ among most of the paralogues,suggesting that repeats diverged independently in gene paralogues.Two modes of divergence were inferred from the similarity distributions of RPs under different categories.The flanking non-repetitive regions among paralogous genes exhibited local conservation,as well as variations indicative of functional diversification.This work highlights the importance of verifying repetitive sequences in the genome assembly.Our findings about repeat duplication and divergence constitute a multi-dimensional model of repeat evolution in linusorb-related genes.