Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerat...Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerating the molecular breeding of crops.Conventional methods for genomic analysis typically rely on small variants,such as SNPs and indels,and use single linear reference genomes,which introduces biases and reduces performance in highly divergent genomic regions.By integrating the population level of sequences,pangenomes,particularly graph pangenomes,offer a promising solution to these challenges.To date,numerous algorithms have been developed for constructing pangenome graphs,aligning reads to these graphs,and performing variant genotyping based on these graphs.As demonstrated in various plant pangenomic studies,these advancements allow for the detection of previously hidden variants,especially structural variants,thereby enhancing applications such as genetic mapping of agronomically important genes.However,noteworthy challenges remain to be overcome in applying pangenome graph approaches to plants.Addressing these issues will require the development of more sophisticated algorithms tailored specifically to plants.Such improvements will contribute to the scalability of this approach,facilitating the production of super-pangenomes,in which hundreds or even thousands of de novo–assembled genomes from one species or genus can be integrated.This,in turn,will promote broader pan-omic studies,further advancing our understanding of genetic diversity and driving innovations in crop breeding.展开更多
Escalating impacts of climate change pose serious threats to ecosystems and agriculture,and thus to human societies and their living conditions.Plants-both wild and cultivated-are constantly facing global warming,drou...Escalating impacts of climate change pose serious threats to ecosystems and agriculture,and thus to human societies and their living conditions.Plants-both wild and cultivated-are constantly facing global warming,drought,and precipitation change,rendering plant adaptability a crucial process.Haplotype-based pangenomics is a collection of genomes from each haplotype in multiple strains of a species,offering a cutting-edge approach for identifying traits that foster climate resilience.Although pangenomics focuses on building a pangenome for an agricultural crop,often within cultivars of a plant species,this technology holds significant untapped potential for preserving biodiversity,stabilizing ecosystems,and sequestering carbon.展开更多
Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the...Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the genomic landscape of Indian cattle.To comprehensively characterize the genomic variation within Bos indicus and,specifically,dairy breeds,we aim to identify non-reference sequences and construct a comprehensive pangenome.Results Five representative genomes of prominent dairy breeds,including Gir,Kankrej,Tharparkar,Sahiwal,and Red Sindhi,were sequenced using 10X Genomics‘linked-read’technology.Assemblies generated from these linked-reads ranged from 2.70 Gb to 2.77 Gb,comparable to the Bos indicus Brahman reference genome.A pangenome of Bos indicus cattle was constructed by comparing the newly assembled genomes with the reference using alignment and graph-based methods,revealing 8 Mb and 17.7 Mb of novel sequence respectively.A confident set of 6,844 Non-reference Unique Insertions(NUIs)spanning 7.57 Mb was identified through both methods,representing the pange-nome of Indian Bos indicus breeds.Comparative analysis with previously published pangenomes unveiled 2.8 Mb(37%)commonality with the Chinese indicine pangenome and only 1%commonality with the Bos taurus pange-nome.Among these,2,312 NUIs encompassing~2 Mb,were commonly found in 98 samples of the 5 breeds and des-ignated as Bos indicus Common Insertions(BICIs)in the population.Furthermore,926 BICIs were identified within 682 protein-coding genes,54 long non-coding RNAs(lncRNA),and 18 pseudogenes.These protein-coding genes were enriched for functions such as chemical synaptic transmission,cell junction organization,cell-cell adhesion,and cell morphogenesis.The protein-coding genes were found in various prominent quantitative trait locus(QTL)regions,suggesting potential roles of BICIs in traits related to milk production,reproduction,exterior,health,meat,and carcass.Notably,63.21%of the bases within the BICIs call set contained interspersed repeats,predominantly Long Inter-spersed Nuclear Elements(LINEs).Additionally,70.28%of BICIs are shared with other domesticated and wild species,highlighting their evolutionary significance.Conclusions This is the first report unveiling a robust set of NUIs defining the pangenome of Bos indicus breeds of India.The analyses contribute valuable insights into the genomic landscape of desi cattle breeds.展开更多
Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two m...Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two most popular packages according to cumulative citations of peerreviewed scientific publications. The functions of the software packages and tools, albeit variable among them, include categorizing orthologous genes, calculating pangenomic profiles, integrating gene annotations, and constructing phylogenies. As epigenomic elements are being gradually revealed in prokaryotes, it is expected that pangenomic databases and toolkits have to be extended to handle information of detailed functional annotations for genes and non-protein-coding sequences including non-coding RNAs, insertion elements, and conserved structural elements. To develop better bioinformatic tools, user feedback and integration of novel features are both of essence.展开更多
Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lac...Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lacking.Here,we report high-quality gene-based and SV-based pangenomes comprising 22 hexaploid wheat assemblies showing a wide range of chromosome size,gene number,and TE component,which indicates their representativeness of wheat genetic diversity.Pan-gene analyses uncover 140,261 distinct gene families,of which only 23.2%are shared in all accessions.Moreover,we build a∼16.15 Gb graph pangenome containing 695,897 bubbles,intersecting 5132 genes and 230,307 cis-regulatory regions.Pairwise genome comparisons identify∼1,978,221 non-redundant SVs and 497 SV hotspots.Notably,the density of bubbles as well as SVs show remarkable aggregation in centromeres,which probably play an important role in chromosome plasticity and stability.As for functional SVs exploration,we identify 2769 SVs with absolute relative frequency differences exceeding 0.7 between spring and winter growth habit groups.Additionally,several reported functional genes in wheat display complex structural graphs,for example,PPD-A1,VRT-A2,and TaNAAT2-A.These findings deepen our understanding of wheat genetic diversity,providing valuable graphical pangenome and variation resources to improve the efficiency of genome-wide association mapping in wheat.展开更多
The pear(Pyrus spp.)is well known for diverse flavors,textures,and global horticultural importance.However,the genetic diversity responsible for its extensive phenotypic variations remains largely unexplored.Here,we d...The pear(Pyrus spp.)is well known for diverse flavors,textures,and global horticultural importance.However,the genetic diversity responsible for its extensive phenotypic variations remains largely unexplored.Here,we de novo assembled and annotated the genomes of the maternal(PsbM)and paternal(PsbF)lines of the hybrid‘Yuluxiang'pear and constructed the pear pangenome of 1.15 Gb by combining these two genomes with five previously published pear genomes representing cultivated and wild germplasm.Using the constructed pangenome,we identified 21224 gene PAVs(Presence-absence variation)and 1158812 SNPs(Single Nucleotide Polymorphism)in the non-reference genome that were absent in the PsbM reference genome.Compared with SNP markers,PAV-based analysis provides additional insights into the pear population structure.In addition,some genes associated with pear fruit quality traits have differential occurrence frequencies and differential gene expression between Asian and European populations.Moreover,our analysis of the pear pangenome revealed a mutated SNP and an insertion in the promoter region of the gene PsbMGH3.1 potentially enhance sepal shedding in‘Xuehuali'which is vital for pear quality.PsbMGH3.1 may play a role in the IAA pathway,contributing to a distinct low-auxin phenotype observed in plants by heterologously overexpressing this gene.This research helps capture the genetic diversity of pear populations and provides genomic resources for accelerating breeding.展开更多
Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non...Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.展开更多
Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through...Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through the de novo assembly of 16 yak genomes.Results We discovered 290 Mb of nonreference sequences and 504 new genes.Our pangenome-wide presence and absence variation(PAV)analysis revealed 5,120 PAV-related genes,highlighting a wide range of variety-specific genes and genes with varying frequencies across yak populations.Principal component analysis(PCA)based on binary gene PAV data classified yaks into three new groups:wild,domestic,and Jinchuan.Moreover,we pro-posed a‘two-haplotype genomic hybridization model'for understanding the hybridization patterns among breeds by integrating gene frequency,heterozygosity,and gene PAV data.A gene PAV-GWAS identified a novel gene(Bos-Gru3G009179)that may be associated with the multirib trait in Jinchuan yaks.Furthermore,an integrated transcrip-tome and pangenome analysis highlighted the significant differences in the expression of core genes and the muta-tional burden of differentially expressed genes between yaks from high and low altitudes.Transcriptome analysis across multiple species revealed that yaks have the most unique differentially expressed m RNAs and lnc RNAs(between high-and low-altitude regions),especially in the heart and lungs,when comparing high-and low-altitude adaptations.Conclusions The yak pangenome offers a comprehensive resource and new insights for functional genomic studies,supporting future biological research and breeding strategies.展开更多
As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of ...As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of origin and migration,suggesting a possible omission of some population-specific sequences in the current reference genome.Conversely,the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals(core genome)and is also able to display sequence information unique to each individual(variable genome).The progress of pangenome research in humans,plants and domestic animals has proved that the missing genetic components and the identification of large structural variants(SVs)can be explored through pangenomic studies.Many individual specific sequences have been shown to be related to biological adaptability,phenotype and important economic traits.The maturity of technologies and methods such as third-generation sequencing,Tel-omere-to-telomere genomes,graphic genomes,and reference-free assembly will further promote the development of pangenome.In the future,pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals,providing better insights into animal domestication,evolution and breeding.In this review,we mainly discuss how pangenome analysis reveals genetic variations in domestic animals(sheep,cattle,pigs,chickens)and their impacts on phenotypes and how this can contribute to the understanding of species diversity.Additionally,we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.展开更多
Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced ...Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.展开更多
By integrating genomes from different accessions,pangenomes provide a more comprehensive and reference-bias-free representation of genetic information within a population compared to a single refer-ence genome.With th...By integrating genomes from different accessions,pangenomes provide a more comprehensive and reference-bias-free representation of genetic information within a population compared to a single refer-ence genome.With the rapid accumulation of genomic sequencing data and the expanding scope of plant research,plant pangenomics has gradually evolved from single-species to multi-species studies.This shift has given rise to the concept of a super-pangenome that covers all genomic sequences within a genus-level taxonomic group.By incorporating both cultivated and wild species,the super-pangenome has greatly enhanced the resolution of research in various areas such as plant genetic diversity,evolution,domestication,and molecular breeding.In this review,we present a comprehensive overview of the plant super-pangenome,emphasizing its development requirements,construction strategies,potential applica-tions,and notable achievements.We also highlight the distinctive advantages and promising prospects of super-pangenomes while addressing current challenges and future directions.展开更多
[Objective]Rummeliibacillus,a genus encompassing three known species,R.stabekisii,R.pycnus,and R.suwonensis,has a wide range of potential applications in biodegradation,probiotics,animal feed,and production of arginin...[Objective]Rummeliibacillus,a genus encompassing three known species,R.stabekisii,R.pycnus,and R.suwonensis,has a wide range of potential applications in biodegradation,probiotics,animal feed,and production of arginine,caproic acid,and other compounds.This study aims to explore the genetic diversity of this genus at the genomic level.[Methods]A comparative pangenome analysis of 12 strains isolated from different sources was conducted.In addition,the phylogenetic analysis,functional annotation,genomic metabolic pathway analysis,and prediction of mobile genetic elements were carried out.[Results]A total of 8024 gene clusters were identified.The core genome,accessory genome,and strain-specific genes comprised 1550,3941,and 2533 gene clusters,respectively.In the core genome,the arginine cycle of six strains was complete.Seven strains had the ability to completely biosynthesize acetoin.However,only R.pycnus and R.suwonensis 3B-1 were able to completely biosynthesize caproic acid.The phylogenetic tree,DNA-DNA hybridization,and average nucleotide identity showed that Rummeliibacillus sp.G93 and Rummeliibacillus sp.TYF-LIM-RU47 were strains of R.stabekisii.Rummeliibacillus sp.POC4 and Rummeliibacillus sp.TYF005 may belong to a new species of this genus.In addition,genomic islands were identified in all the 12 strains,with the number ranging from four(R.stabekisii DSM 25578 and R.stabekisii NBRC 104870)to 14(Rummeliibacillus sp.SL167 and Rummeliibacillus sp.TYF005),and prophage sequences were found in five of the 12strains.[Conclusion]This study provides a genomic framework for Rummeliibacillus that could assist the further exploration of this genus.展开更多
Natural variations are the foundation of crop improvement.However,genomic variability remains largely understudied.Here,we present the full-spectrum integrated panvariome and pangenome of 1,020 peach accessions,includ...Natural variations are the foundation of crop improvement.However,genomic variability remains largely understudied.Here,we present the full-spectrum integrated panvariome and pangenome of 1,020 peach accessions,including 10.5 million single-nucleotide polymorphisms,insertions,deletions,duplications,inversions,translocations,copy-number variations,transposon-insertion polymorphisms,and presence-absence variations,uncovering 70.6%novel variants and 3,289 novel genes.Analysis of the panvariome recapitulated the global evolutionary history of the peach and identified several novel trait-causally rare variants.We found that landraces and improved accessions encode more genes than the wild accessions,implying gene gains during peach domestication and improvement.Analysis of global introgression patterns revealed their value in phenotype prediction and gene mining,and suggested that the most likely wild progenitor of the domesticated peach is Prunus mira and that almond was involved in the origin of Prunus davidiana.Furthermore,we developed a novel panvariome-based one-step solution for association study,GWASPV,which was used to identify several trait-conferring genes and over 2,000 novel associations..Collectively,our study reveals new insights into peach evolution and genomic variations,providing a novel method for plant gene mining and important targets for peach breeding.展开更多
Dear Editor,The genus Actinidia is native to China and well known as kiwifruit;it has been classified into 75 taxa,including 54 species and 21 subspecies.Since release of the first draft genome of Actinidia chinensis...Dear Editor,The genus Actinidia is native to China and well known as kiwifruit;it has been classified into 75 taxa,including 54 species and 21 subspecies.Since release of the first draft genome of Actinidia chinensis‘Hongyang’in 2013,extensive studies have made great progress in gene cloning,genetic mapping,metabolic regulation,and molecular breeding of kiwifruit.展开更多
Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream appl...Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.展开更多
Traditional reference genomes,particularly those derived from diploid or collapsed assemblies used for polyploid or highly heterozygous species,fail to represent the full spectrum of structural,sequence,and haplotype ...Traditional reference genomes,particularly those derived from diploid or collapsed assemblies used for polyploid or highly heterozygous species,fail to represent the full spectrum of structural,sequence,and haplotype diversity that underlies phenotypic variation and adaptive potential in a species(Bayer et al.2020;Eizenga et al.,2020;Della Coletta et al.,2021).展开更多
Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of...Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.展开更多
Genome-wide identification and comparative gene-famiy analyses have commonly been performed to investigate spedesspecif ic evolution Inked to various traits and molecular pathways.However,most previous studies have be...Genome-wide identification and comparative gene-famiy analyses have commonly been performed to investigate spedesspecif ic evolution Inked to various traits and molecular pathways.However,most previous studies have been Smited to gene screening h a single reference genome,faiing to account for the gene presence/absence variations(gPAVs)in a species.Here,we propose an innovative pangenome-based approach for gene-family analyses based on orthologous gene groups(OGGs).Usng the basic heix-bop-helix(bH_H)transcription factor family in barley as an example,we identified 161-176 bHLHs in 20 barley genomes,which can be classified into 201 OGGs.These 201 OGGs were further dassif jed into 140 core,12 softcore,29 shell,and 20 Ihe-spedfic/cloud bHLHs,reveaing the complete profile of bHLH genes in barley.Using a genome-scanning approach,we overcame the genome annotation bias and identified an average of 15 un-amotated core bHLHs per barley genome We found that whole-genome/segmental duplicates are predominant mechanisms contrixiting to the expansion of most core/softcore bHLHs,whereas dispensable bHLHs are more Ikely to result from small-scale dupication events.Interestingly,we noticed that the dispensable bHLHs tend to be enriched in the specific subfamiSes SF13,SF27,and SF28,rn plying the potentially based expansion of specific bHLHs h barley.We found that 50%of the bHLHs con tan at least 1 intact transposon element(TE)within the 2-kb upstream-to-downstream region.bHLHs with copy-number variations(CWs)have 1.48 TEs on average,sigrif icantiy more than core bhLHs without CNVs(1.36),supporting a potential role ofTEs in bHLH expansion.Analyses of selection pressure showed that dispensablebHLHs have experienced dearrelaxation of selection compared with core bHLHs,consistent with their conservation patterns.We also integrated the pangenome data with recently avaiable barley pantranscrip-tome data from 5 tissues and discovered apparent transcriptional divergence within and across bHLH sifofamiies.We conclude that pangenome-based gene-family analyses can better describe the previously untapped,genu'ne evolutionary status of bHLHs and provide novel insights into bHLH evolution in barley.We expect that this study wil inspire similar analyses in many other gene famiies and species.展开更多
The allotetraploid wild grass Aegilops ventricosa(2n=4x=28,genome D^(v)D^(v)N^(v)N^(v))has been recognized as an important germplasm resource for wheat improvement owing to its ability to tolerate biotic stresses.In p...The allotetraploid wild grass Aegilops ventricosa(2n=4x=28,genome D^(v)D^(v)N^(v)N^(v))has been recognized as an important germplasm resource for wheat improvement owing to its ability to tolerate biotic stresses.In particular,the 2N^(v)S segment from Ae.ventricosa,as a stable and effective resistance source,has contributed greatly to wheat improvement.The 2N^(v)S/2AS translocation is a prevalent chromosomal translocation between common wheat and wild relatives,ranking just behind the 1B/1R translocation in importance for modern wheat breeding.Here,we assembled a high-quality chromosome-level reference genome of Ae.ventricosa RM271 with a total length of 8.67 Gb.Phylogenomic analyses revealed that the progenitor of the D^(v) subgenome of Ae.ventricosa is Ae.tauschii ssp.tauschii(genome DD);by contrast,the progenitor of the D subgenome of bread wheat(Triticumaestivum L.)is Ae.tauschii ssp.strangulata(genome DD).The oldest polyploidization time of Ae.ventricosa occurred0.7 mya.The D^(v) subgenome of Ae.ventricosa is less conserved than the D subgenome of bread wheat.Construction of a graph-based pangenome of 2AS/6N^(v)L(originally known as 2N^(v)S)segments from Ae.ventricosa and other genomes in the Triticeae enabled us to identify candidate resistance genes sourced from Ae.ventricosa.We identified 12 nonredundant introgressed segments from the D^(v) and N^(v) subgenomes using a large winter wheat collection representing the full diversity of the European wheat genetic pool,and 29.40%of European wheat varieties inherit at least one of these segments.The high-quality RM271 reference genome will provide a basis for cloning key genes,including the Yr17-Lr37-Sr38-Cre5 resistance gene cluster in Ae.ventricosa,and facilitate the full use of elite wild genetic resources to accelerate wheat improvement.展开更多
基金funded by the National Natural Science Fund for Excellent Young Scientists Fund Program(Overseas)the Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops.
文摘Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerating the molecular breeding of crops.Conventional methods for genomic analysis typically rely on small variants,such as SNPs and indels,and use single linear reference genomes,which introduces biases and reduces performance in highly divergent genomic regions.By integrating the population level of sequences,pangenomes,particularly graph pangenomes,offer a promising solution to these challenges.To date,numerous algorithms have been developed for constructing pangenome graphs,aligning reads to these graphs,and performing variant genotyping based on these graphs.As demonstrated in various plant pangenomic studies,these advancements allow for the detection of previously hidden variants,especially structural variants,thereby enhancing applications such as genetic mapping of agronomically important genes.However,noteworthy challenges remain to be overcome in applying pangenome graph approaches to plants.Addressing these issues will require the development of more sophisticated algorithms tailored specifically to plants.Such improvements will contribute to the scalability of this approach,facilitating the production of super-pangenomes,in which hundreds or even thousands of de novo–assembled genomes from one species or genus can be integrated.This,in turn,will promote broader pan-omic studies,further advancing our understanding of genetic diversity and driving innovations in crop breeding.
基金supported by the Biological Breeding-National Science and Technology Major Project(Grant No.2022ZD04017 to Peng Cui),China.
文摘Escalating impacts of climate change pose serious threats to ecosystems and agriculture,and thus to human societies and their living conditions.Plants-both wild and cultivated-are constantly facing global warming,drought,and precipitation change,rendering plant adaptability a crucial process.Haplotype-based pangenomics is a collection of genomes from each haplotype in multiple strains of a species,offering a cutting-edge approach for identifying traits that foster climate resilience.Although pangenomics focuses on building a pangenome for an agricultural crop,often within cultivars of a plant species,this technology holds significant untapped potential for preserving biodiversity,stabilizing ecosystems,and sequestering carbon.
基金the project “Genomics for Conservation of Indigenous Cattle Breeds and for Enhancing Milk Yield, Phase-I” [BT/ PR26466/AAQ/1/704/2017], funded by the Department of Biotechnology (DBT ), Indiathe project “Identification of key molecular factors involved in resistance/susceptibility to paratuberculosis infection in indigenous breeds of cows” [BT/PR32758/AAQ/1/760/2019], which was also funded by Department of Biotechnology (DBT ), India。
文摘Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the genomic landscape of Indian cattle.To comprehensively characterize the genomic variation within Bos indicus and,specifically,dairy breeds,we aim to identify non-reference sequences and construct a comprehensive pangenome.Results Five representative genomes of prominent dairy breeds,including Gir,Kankrej,Tharparkar,Sahiwal,and Red Sindhi,were sequenced using 10X Genomics‘linked-read’technology.Assemblies generated from these linked-reads ranged from 2.70 Gb to 2.77 Gb,comparable to the Bos indicus Brahman reference genome.A pangenome of Bos indicus cattle was constructed by comparing the newly assembled genomes with the reference using alignment and graph-based methods,revealing 8 Mb and 17.7 Mb of novel sequence respectively.A confident set of 6,844 Non-reference Unique Insertions(NUIs)spanning 7.57 Mb was identified through both methods,representing the pange-nome of Indian Bos indicus breeds.Comparative analysis with previously published pangenomes unveiled 2.8 Mb(37%)commonality with the Chinese indicine pangenome and only 1%commonality with the Bos taurus pange-nome.Among these,2,312 NUIs encompassing~2 Mb,were commonly found in 98 samples of the 5 breeds and des-ignated as Bos indicus Common Insertions(BICIs)in the population.Furthermore,926 BICIs were identified within 682 protein-coding genes,54 long non-coding RNAs(lncRNA),and 18 pseudogenes.These protein-coding genes were enriched for functions such as chemical synaptic transmission,cell junction organization,cell-cell adhesion,and cell morphogenesis.The protein-coding genes were found in various prominent quantitative trait locus(QTL)regions,suggesting potential roles of BICIs in traits related to milk production,reproduction,exterior,health,meat,and carcass.Notably,63.21%of the bases within the BICIs call set contained interspersed repeats,predominantly Long Inter-spersed Nuclear Elements(LINEs).Additionally,70.28%of BICIs are shared with other domesticated and wild species,highlighting their evolutionary significance.Conclusions This is the first report unveiling a robust set of NUIs defining the pangenome of Bos indicus breeds of India.The analyses contribute valuable insights into the genomic landscape of desi cattle breeds.
基金supported by the National High-tech R&D Program (863 Program Grant No. 2012AA020409) from theMinistry of Science and Technology of China+1 种基金the Key Program of the Chinese Academy of Sciences (Grant No. KSZD-EW-TZ-009-02)the National Natural Science Foundation of China (Grant Nos. 31471248 and 31271386)
文摘Since the proposal for pangenomic study, there have been a dozen software tools actively in use for pangenomic analysis. By the end of 2014, Panseq and the pan-genomes analysis pipeline(PGAP) ranked as the top two most popular packages according to cumulative citations of peerreviewed scientific publications. The functions of the software packages and tools, albeit variable among them, include categorizing orthologous genes, calculating pangenomic profiles, integrating gene annotations, and constructing phylogenies. As epigenomic elements are being gradually revealed in prokaryotes, it is expected that pangenomic databases and toolkits have to be extended to handle information of detailed functional annotations for genes and non-protein-coding sequences including non-coding RNAs, insertion elements, and conserved structural elements. To develop better bioinformatic tools, user feedback and integration of novel features are both of essence.
基金supported by the National Key Research and Development Program of China(2023YFF1000100 and 2023YFA0914601)the Special Funds for Science Technology Innovation and Industrial Development of Shenzhen Dapeng New District(PT202101-01).
文摘Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lacking.Here,we report high-quality gene-based and SV-based pangenomes comprising 22 hexaploid wheat assemblies showing a wide range of chromosome size,gene number,and TE component,which indicates their representativeness of wheat genetic diversity.Pan-gene analyses uncover 140,261 distinct gene families,of which only 23.2%are shared in all accessions.Moreover,we build a∼16.15 Gb graph pangenome containing 695,897 bubbles,intersecting 5132 genes and 230,307 cis-regulatory regions.Pairwise genome comparisons identify∼1,978,221 non-redundant SVs and 497 SV hotspots.Notably,the density of bubbles as well as SVs show remarkable aggregation in centromeres,which probably play an important role in chromosome plasticity and stability.As for functional SVs exploration,we identify 2769 SVs with absolute relative frequency differences exceeding 0.7 between spring and winter growth habit groups.Additionally,several reported functional genes in wheat display complex structural graphs,for example,PPD-A1,VRT-A2,and TaNAAT2-A.These findings deepen our understanding of wheat genetic diversity,providing valuable graphical pangenome and variation resources to improve the efficiency of genome-wide association mapping in wheat.
基金supported by the National Natural Science Foundation of China(Grant No.32102364)the General Program of Shandong Natural Science Foundation(Grant No.ZR2022MC064)+3 种基金the Shanxi Province Postdoctoral Research Activity Fund(Grant No.K462101001)the Doctoral Research Initiation Fund of Shanxi Datong University(Grant No.2023-B-15)the Earmarked Fund for Modern Agro-industry Technology Research System(Grant No.2023CYJSTX07)the Shanxi Province Excellent Doctoral Work Award Project(Grant No.606-02010609)。
文摘The pear(Pyrus spp.)is well known for diverse flavors,textures,and global horticultural importance.However,the genetic diversity responsible for its extensive phenotypic variations remains largely unexplored.Here,we de novo assembled and annotated the genomes of the maternal(PsbM)and paternal(PsbF)lines of the hybrid‘Yuluxiang'pear and constructed the pear pangenome of 1.15 Gb by combining these two genomes with five previously published pear genomes representing cultivated and wild germplasm.Using the constructed pangenome,we identified 21224 gene PAVs(Presence-absence variation)and 1158812 SNPs(Single Nucleotide Polymorphism)in the non-reference genome that were absent in the PsbM reference genome.Compared with SNP markers,PAV-based analysis provides additional insights into the pear population structure.In addition,some genes associated with pear fruit quality traits have differential occurrence frequencies and differential gene expression between Asian and European populations.Moreover,our analysis of the pear pangenome revealed a mutated SNP and an insertion in the promoter region of the gene PsbMGH3.1 potentially enhance sepal shedding in‘Xuehuali'which is vital for pear quality.PsbMGH3.1 may play a role in the IAA pathway,contributing to a distinct low-auxin phenotype observed in plants by heterologously overexpressing this gene.This research helps capture the genetic diversity of pear populations and provides genomic resources for accelerating breeding.
基金This work was supported by the National Key Research and Development Program of China(grant no.2022YFF1000500)National Natural Science Foundation of China(grant no.31941007)Zhejiang province agriculture(livestock)varieties breeding Key Technology R&D Program(grant no.2016C02054-2).
文摘Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.
基金supported by the National Key R&D Program of China(2021YFD1600200)Program of National Beef Cattle and Yak Industrial Technol-ogy System(NO.CARS-37)+1 种基金Natural Science Foundation of Sichuan Province(General Program)(24NSFSC0581)the Scientific and Technological Innovation Team for Qinghai-Tibetan Plateau Research in Southwest Minzu University(Grant No.2024CXTD02)。
文摘Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through the de novo assembly of 16 yak genomes.Results We discovered 290 Mb of nonreference sequences and 504 new genes.Our pangenome-wide presence and absence variation(PAV)analysis revealed 5,120 PAV-related genes,highlighting a wide range of variety-specific genes and genes with varying frequencies across yak populations.Principal component analysis(PCA)based on binary gene PAV data classified yaks into three new groups:wild,domestic,and Jinchuan.Moreover,we pro-posed a‘two-haplotype genomic hybridization model'for understanding the hybridization patterns among breeds by integrating gene frequency,heterozygosity,and gene PAV data.A gene PAV-GWAS identified a novel gene(Bos-Gru3G009179)that may be associated with the multirib trait in Jinchuan yaks.Furthermore,an integrated transcrip-tome and pangenome analysis highlighted the significant differences in the expression of core genes and the muta-tional burden of differentially expressed genes between yaks from high and low altitudes.Transcriptome analysis across multiple species revealed that yaks have the most unique differentially expressed m RNAs and lnc RNAs(between high-and low-altitude regions),especially in the heart and lungs,when comparing high-and low-altitude adaptations.Conclusions The yak pangenome offers a comprehensive resource and new insights for functional genomic studies,supporting future biological research and breeding strategies.
基金supported by the National Natural Science Foundation of China (grant numbers 31961143021)the earmarked fund for Modern Agro-industry Technology Research System (grant numbers CARS-39-01)+1 种基金the Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (grant numbers ASTIP-IAS01) to YM and LJsupported by the Elite Youth Program in Chinese Academy of Agricultural Sciences
文摘As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of origin and migration,suggesting a possible omission of some population-specific sequences in the current reference genome.Conversely,the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals(core genome)and is also able to display sequence information unique to each individual(variable genome).The progress of pangenome research in humans,plants and domestic animals has proved that the missing genetic components and the identification of large structural variants(SVs)can be explored through pangenomic studies.Many individual specific sequences have been shown to be related to biological adaptability,phenotype and important economic traits.The maturity of technologies and methods such as third-generation sequencing,Tel-omere-to-telomere genomes,graphic genomes,and reference-free assembly will further promote the development of pangenome.In the future,pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals,providing better insights into animal domestication,evolution and breeding.In this review,we mainly discuss how pangenome analysis reveals genetic variations in domestic animals(sheep,cattle,pigs,chickens)and their impacts on phenotypes and how this can contribute to the understanding of species diversity.Additionally,we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.
基金supported by a grant from King Abdulaziz City for Science and Technology,Riyadh,Saudi Arabia(No. KACST 428-29)institutional grant from CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics, Chinese Academy of Sciences+2 种基金supported by the grants from the National Basic Research Program(973 Program)(No.2010CB126604)the Special Foundation Work Program(No.2009FY 120100)the Ministry of Science and Technology of the People's Republic of China and from the National Science Foundation of China(No. 31071163).
文摘Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.
基金funded by the National Natural Science Foundation of China(32188102 and 32372148)the Innovation Program of Chinese Academy of Agricultural Sciences(CAAS-CSIAF-202303)the National Key R&D Program of China(2022YFE0139400).
文摘By integrating genomes from different accessions,pangenomes provide a more comprehensive and reference-bias-free representation of genetic information within a population compared to a single refer-ence genome.With the rapid accumulation of genomic sequencing data and the expanding scope of plant research,plant pangenomics has gradually evolved from single-species to multi-species studies.This shift has given rise to the concept of a super-pangenome that covers all genomic sequences within a genus-level taxonomic group.By incorporating both cultivated and wild species,the super-pangenome has greatly enhanced the resolution of research in various areas such as plant genetic diversity,evolution,domestication,and molecular breeding.In this review,we present a comprehensive overview of the plant super-pangenome,emphasizing its development requirements,construction strategies,potential applica-tions,and notable achievements.We also highlight the distinctive advantages and promising prospects of super-pangenomes while addressing current challenges and future directions.
文摘[Objective]Rummeliibacillus,a genus encompassing three known species,R.stabekisii,R.pycnus,and R.suwonensis,has a wide range of potential applications in biodegradation,probiotics,animal feed,and production of arginine,caproic acid,and other compounds.This study aims to explore the genetic diversity of this genus at the genomic level.[Methods]A comparative pangenome analysis of 12 strains isolated from different sources was conducted.In addition,the phylogenetic analysis,functional annotation,genomic metabolic pathway analysis,and prediction of mobile genetic elements were carried out.[Results]A total of 8024 gene clusters were identified.The core genome,accessory genome,and strain-specific genes comprised 1550,3941,and 2533 gene clusters,respectively.In the core genome,the arginine cycle of six strains was complete.Seven strains had the ability to completely biosynthesize acetoin.However,only R.pycnus and R.suwonensis 3B-1 were able to completely biosynthesize caproic acid.The phylogenetic tree,DNA-DNA hybridization,and average nucleotide identity showed that Rummeliibacillus sp.G93 and Rummeliibacillus sp.TYF-LIM-RU47 were strains of R.stabekisii.Rummeliibacillus sp.POC4 and Rummeliibacillus sp.TYF005 may belong to a new species of this genus.In addition,genomic islands were identified in all the 12 strains,with the number ranging from four(R.stabekisii DSM 25578 and R.stabekisii NBRC 104870)to 14(Rummeliibacillus sp.SL167 and Rummeliibacillus sp.TYF005),and prophage sequences were found in five of the 12strains.[Conclusion]This study provides a genomic framework for Rummeliibacillus that could assist the further exploration of this genus.
基金supported by the National Key Research and Development Program(2023YFE0105400)National Natural Science Foundation of China(32341042)+4 种基金Central Public-Interest Scientific Institution Basal Research Fund(Y2022QC23)National Key Laboratory&Zhongyuan Research Center‘Xinyi’Project(ZYZX20240304)Agricultural Science and Technology Innovation Program(CAAS-ASTIP-2024-ZFRI-01)Natural Science Foundation of Henan(232300421042)National Science and Technology Major Project of Yunan(202302AE090005-3).
文摘Natural variations are the foundation of crop improvement.However,genomic variability remains largely understudied.Here,we present the full-spectrum integrated panvariome and pangenome of 1,020 peach accessions,including 10.5 million single-nucleotide polymorphisms,insertions,deletions,duplications,inversions,translocations,copy-number variations,transposon-insertion polymorphisms,and presence-absence variations,uncovering 70.6%novel variants and 3,289 novel genes.Analysis of the panvariome recapitulated the global evolutionary history of the peach and identified several novel trait-causally rare variants.We found that landraces and improved accessions encode more genes than the wild accessions,implying gene gains during peach domestication and improvement.Analysis of global introgression patterns revealed their value in phenotype prediction and gene mining,and suggested that the most likely wild progenitor of the domesticated peach is Prunus mira and that almond was involved in the origin of Prunus davidiana.Furthermore,we developed a novel panvariome-based one-step solution for association study,GWASPV,which was used to identify several trait-conferring genes and over 2,000 novel associations..Collectively,our study reveals new insights into peach evolution and genomic variations,providing a novel method for plant gene mining and important targets for peach breeding.
基金supported by the National Natural Science Foundation of China(U23A20204 and 32472680)the Anhui Provincial Natural Science Foundation(2308085MC69)the Key Scientific Research Foundation of the Education Department of Anhui Province(2024AH050452).
文摘Dear Editor,The genus Actinidia is native to China and well known as kiwifruit;it has been classified into 75 taxa,including 54 species and 21 subspecies.Since release of the first draft genome of Actinidia chinensis‘Hongyang’in 2013,extensive studies have made great progress in gene cloning,genetic mapping,metabolic regulation,and molecular breeding of kiwifruit.
基金funded by the National Natural Science Foundation of China(no.32270685)the National Natural Science Fund for Excellent Young Scientists Fund Program(Overseas)the"Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops."
文摘Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.
基金supported by grants from the National Science and Technology Major Project 2030(2024ZD040780104)the Program for Guangdong Laboratory for Lingnan Modern Agriculture(NG2022002 and NT2025003)+2 种基金the Double First-Class Discipline Promotion Project of South China Agricultural University(2023B10564004)to Y.W.the National Natural Science Foundation of China and South China Agricultural University to Q.L.the Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops to W.-B.J.
文摘Traditional reference genomes,particularly those derived from diploid or collapsed assemblies used for polyploid or highly heterozygous species,fail to represent the full spectrum of structural,sequence,and haplotype diversity that underlies phenotypic variation and adaptive potential in a species(Bayer et al.2020;Eizenga et al.,2020;Della Coletta et al.,2021).
基金supported by the National Natural Science Foundation of China(31972969)the Science and Technology Department of Yunnan Province(2019FY003015)+4 种基金Research Startup Funding of Yunnan University in China(C176220100033)the Science and Technology Major Project of the Department of Science and Technology of Yunnan(K204204210017)Yunnan Fundamental Research Projects(202301BF070001-026)the Project of Central Guiding Local Technology Development(202407AB110005)the Yunnan Talent Support Plan(C619300A036)。
文摘Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.
基金supported by the Australia Grain Research and Development Corporation(9176507).
文摘Genome-wide identification and comparative gene-famiy analyses have commonly been performed to investigate spedesspecif ic evolution Inked to various traits and molecular pathways.However,most previous studies have been Smited to gene screening h a single reference genome,faiing to account for the gene presence/absence variations(gPAVs)in a species.Here,we propose an innovative pangenome-based approach for gene-family analyses based on orthologous gene groups(OGGs).Usng the basic heix-bop-helix(bH_H)transcription factor family in barley as an example,we identified 161-176 bHLHs in 20 barley genomes,which can be classified into 201 OGGs.These 201 OGGs were further dassif jed into 140 core,12 softcore,29 shell,and 20 Ihe-spedfic/cloud bHLHs,reveaing the complete profile of bHLH genes in barley.Using a genome-scanning approach,we overcame the genome annotation bias and identified an average of 15 un-amotated core bHLHs per barley genome We found that whole-genome/segmental duplicates are predominant mechanisms contrixiting to the expansion of most core/softcore bHLHs,whereas dispensable bHLHs are more Ikely to result from small-scale dupication events.Interestingly,we noticed that the dispensable bHLHs tend to be enriched in the specific subfamiSes SF13,SF27,and SF28,rn plying the potentially based expansion of specific bHLHs h barley.We found that 50%of the bHLHs con tan at least 1 intact transposon element(TE)within the 2-kb upstream-to-downstream region.bHLHs with copy-number variations(CWs)have 1.48 TEs on average,sigrif icantiy more than core bhLHs without CNVs(1.36),supporting a potential role ofTEs in bHLH expansion.Analyses of selection pressure showed that dispensablebHLHs have experienced dearrelaxation of selection compared with core bHLHs,consistent with their conservation patterns.We also integrated the pangenome data with recently avaiable barley pantranscrip-tome data from 5 tissues and discovered apparent transcriptional divergence within and across bHLH sifofamiies.We conclude that pangenome-based gene-family analyses can better describe the previously untapped,genu'ne evolutionary status of bHLHs and provide novel insights into bHLH evolution in barley.We expect that this study wil inspire similar analyses in many other gene famiies and species.
基金supported by the Sichuan Provincial Finance Department Project of China(1+3 ZYGG001)the Sichuan Province Science and Technology Department Project of China(2021YFYZ0020,2022NSFSC0161,2023NSFSC1925)the Program of Strategic Scientist Studio of Sichuan Academy of Agricultural Sciences,and the Program of Chinese Agriculture Research System(CARS-03).
文摘The allotetraploid wild grass Aegilops ventricosa(2n=4x=28,genome D^(v)D^(v)N^(v)N^(v))has been recognized as an important germplasm resource for wheat improvement owing to its ability to tolerate biotic stresses.In particular,the 2N^(v)S segment from Ae.ventricosa,as a stable and effective resistance source,has contributed greatly to wheat improvement.The 2N^(v)S/2AS translocation is a prevalent chromosomal translocation between common wheat and wild relatives,ranking just behind the 1B/1R translocation in importance for modern wheat breeding.Here,we assembled a high-quality chromosome-level reference genome of Ae.ventricosa RM271 with a total length of 8.67 Gb.Phylogenomic analyses revealed that the progenitor of the D^(v) subgenome of Ae.ventricosa is Ae.tauschii ssp.tauschii(genome DD);by contrast,the progenitor of the D subgenome of bread wheat(Triticumaestivum L.)is Ae.tauschii ssp.strangulata(genome DD).The oldest polyploidization time of Ae.ventricosa occurred0.7 mya.The D^(v) subgenome of Ae.ventricosa is less conserved than the D subgenome of bread wheat.Construction of a graph-based pangenome of 2AS/6N^(v)L(originally known as 2N^(v)S)segments from Ae.ventricosa and other genomes in the Triticeae enabled us to identify candidate resistance genes sourced from Ae.ventricosa.We identified 12 nonredundant introgressed segments from the D^(v) and N^(v) subgenomes using a large winter wheat collection representing the full diversity of the European wheat genetic pool,and 29.40%of European wheat varieties inherit at least one of these segments.The high-quality RM271 reference genome will provide a basis for cloning key genes,including the Yr17-Lr37-Sr38-Cre5 resistance gene cluster in Ae.ventricosa,and facilitate the full use of elite wild genetic resources to accelerate wheat improvement.