Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats...Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats(TRs).Using the sequence patterns of the linusorb-embedded TRs,we previously mined the flax reference genome assembly and identified>280 TRs containing linusorb-like domains distributed in 25 proteins,revealing the potential diversity of linusorbs.In this work,we studied the evolution of TRs in the 30 linusorb-related genes by first verifying the gene sequences using Sanger method.Comparison of the Sanger contigs with the reference genome assembly showed widespread discrepancies in the repeat regions.We annotated the Sanger contigs and identified eight groups of paralogous genes.Pairwise comparisons were conducted among repeats within a region,across regions within a gene and across paralogous genes.Similarity matrices revealed three distinct modes of tandem duplication that differ in the number of repeats as a duplication unit.Most of the across-paralogue repeat pairs(RPs)share similarity lower than 50%.The numbers of repeat regions and repeats also differ among most of the paralogues,suggesting that repeats diverged independently in gene paralogues.Two modes of divergence were inferred from the similarity distributions of RPs under different categories.The flanking non-repetitive regions among paralogous genes exhibited local conservation,as well as variations indicative of functional diversification.This work highlights the importance of verifying repetitive sequences in the genome assembly.Our findings about repeat duplication and divergence constitute a multi-dimensional model of repeat evolution in linusorb-related genes.展开更多
基金supported by the Agriculture Development Fund(ADF)from the Saskatchewan Ministry of Agriculture(ADF-20080205,20120099 and 20120146)Genome Canada,Total Utiliza-tion Flax Genomics(TUFGEN-1309)+1 种基金Canada Foundation for Innovation(CFI-23426)National Natural Science Foundation of China(Grant No.32401975)also provided financial support.
文摘Linusorbs are cyclic peptides biosynthesized through post-translational modification of precursor proteins in flaxseed.Their precursor peptide domains are embedded in five proteins,four of which contain tandem repeats(TRs).Using the sequence patterns of the linusorb-embedded TRs,we previously mined the flax reference genome assembly and identified>280 TRs containing linusorb-like domains distributed in 25 proteins,revealing the potential diversity of linusorbs.In this work,we studied the evolution of TRs in the 30 linusorb-related genes by first verifying the gene sequences using Sanger method.Comparison of the Sanger contigs with the reference genome assembly showed widespread discrepancies in the repeat regions.We annotated the Sanger contigs and identified eight groups of paralogous genes.Pairwise comparisons were conducted among repeats within a region,across regions within a gene and across paralogous genes.Similarity matrices revealed three distinct modes of tandem duplication that differ in the number of repeats as a duplication unit.Most of the across-paralogue repeat pairs(RPs)share similarity lower than 50%.The numbers of repeat regions and repeats also differ among most of the paralogues,suggesting that repeats diverged independently in gene paralogues.Two modes of divergence were inferred from the similarity distributions of RPs under different categories.The flanking non-repetitive regions among paralogous genes exhibited local conservation,as well as variations indicative of functional diversification.This work highlights the importance of verifying repetitive sequences in the genome assembly.Our findings about repeat duplication and divergence constitute a multi-dimensional model of repeat evolution in linusorb-related genes.