Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational...Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.展开更多
Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage se...Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.展开更多
Allopolyploids often exhibit advantages in vigor and adaptability compared to diploids.A long-term goal in the economically important Brassica genus has been to develop a new allohexaploid crop type(AABBCC)by combinin...Allopolyploids often exhibit advantages in vigor and adaptability compared to diploids.A long-term goal in the economically important Brassica genus has been to develop a new allohexaploid crop type(AABBCC)by combining different diploid and allotetraploid crop species.However,early-generation allohexaploids often face challenges like unstable meiosis and low fertility,and the phenotypic performance of these synthetic lines has rarely been assessed.This study analyzes agronomic traits,fertility,and genome stability in ArArBcBcCcCc lines derived from four crosses between B.carinata and B.rapa after 9–11 selfing generations.Our results demonstrate polyploid advantage in vigor and seed traits,considerable phenotypic variation,and high fertility and genome stability.Meanwhile,parental genotypes significantly influence outcomes in advanced allohexaploids.Structural variants,largely resulting from A–C homoeologous exchanges,contribute to genomic variation and influence hexaploid genome stability,with the A sub-genome showing the highest variability.Both positive and negative impacts of SVs on fertility and seed weight are observed.Pseudo-euploids,frequently appearing,do not significantly affect fertility or other agronomic traits compared to euploids,indicating a potential pathway toward a stable allohexaploid species.These findings provide insights into the challenge and potential for developing an adaptable and stable Brassica hexaploid through selection.展开更多
Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyp...Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyprinus carpio wuyuanensis),an indigenous strain in China,is renowned for its unique body morphology and reddish skin.To reveal the genetic basis underlying the distinct skin color of HB,we constructed an improved highfidelity(HiFi) HB genome with good contiguity,completeness,and correctness.Genome structure comparison was conducted between HB and a representative wild strain,Yellow River carp(YR,C.carpio haematopterus),to identify structural variants and genes under positive selection.Signatures of artificial selection during domestication were identified in HB and YR populations,while phenotype mapping was performed in a segregating population generated by HB×YR crosses.Body color in HB was associated with regions with fixed mutations.The simultaneous mutation and superposition of a pair of homologous genes(mitfa) in chromosomes A06 and B06 conferred the reddish color in domesticated HB.Transcriptome analysis of common carp with different alleles of the mitfa mutation confirmed that gene duplication can buffer the deleterious effects of mutation in allotetraploids.This study provides new insights into genotype-phenotype associations in allotetraploid species and lays a foundation for future breeding of common carp.展开更多
Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified...Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified broad H3K4me3 domains are essential for the regulation of genes involved in several biological processes.However,the role of broad H3K4me3 domains in phenotypic divergence remains poorly understood.Siniperca chuatsi and S.scherzeri are closely related but divergent in several phenotypic traits,making them an ideal model to study cis-regulatory evolution in sister species.Here,we generated chromosome-level genomes of S.chuatsi and S.scherzeri,with assembled genome sizes of 716.35 and740.54 Mb,respectively.The evolutionary histories of S.chuatsi and S.scherzeri were studied by inferring dynamic changes in ancestral population sizes.To explore the genetic basis of adaptation in S.chuatsi and S.scherzeri,we performed gene family expansion and contraction analysis and identified positively selected genes(PSGs).To investigate the role of SVs in cis-regulatory divergence of closely related fish species,we identified high-quality SVs as well as divergent H3K27ac and H3K4me3 domains in the genomes of S.chuatsi and S.scherzeri.Integrated analysis revealed that cis-regulatory divergence caused by SVs played an essential role in phenotypic divergence between S.chuatsi and S.scherzeri.Additionally,divergent broad H3K4me3 domains were mostly associated with cancer-related genes in S.chuatsi and S.scherzeri and contributed to their phenotypic divergence.展开更多
Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete...Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete Bombyx mori genome. A representational BmSE element is flanked by an 11 bp target site duplication sequence posterior poly (A) at the 3′ end and has the sequence motifs of an internal promoter of RNA polymerase III, which are similar to that of Bm1. The repetitive elements of BmSE are widely distributed in all 28 chromosomes of the genome and share the common (ATTT) repeats at the ends. GC-content distribution shows that BmSE tends to accumulate preferably in the region of higher AT content than that of Bm1. A high proportion of the BmSEs are mapped to the coding sequence introns, whereas several elements are also present in the UTR of some transcripts, indicating that BmSEs are indeed exonized with UTRs. Of the 615 identified structural variants (SVs) of BmSE among the 40 domesticated and wild silkworms, only 230 SVs were found in the domesticated silkworms, indicating that many recent SV events of BmSE occurred after domestication, which was probably due to its mobilization. Our analysis might assist in developing BmSE as a potential marker and in understanding the evolutionary roles of SINEs in the domesticated silkworm.展开更多
Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream appl...Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.展开更多
Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of...Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.展开更多
The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome...The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome assemblies of different morphotypes.The pan-genome harbors over 200 structural variant hotspot regions enriched in auxin-andflowering-related genes.Population genomic analyses revealed that early domestication of B.oleracea focused on leaf or stem development.Geneflows resulting from agricultural practices and variety improvement were detected among different morphotypes.Selective-sweep and pan-genome analyses identified an auxin-responsive small auxin up-regulated RNA gene and a CLAV-ATA3/ESR-RELATED family gene as crucial players in leaf–stem differentiation during the early stage of B.oleracea domestication and the BoKAN1 gene as instrumental in shaping the leafy heads of cabbage and Brussels sprouts.Our pan-genome and functional analyses further revealed that variations in the BoFLC2 gene play key roles in the divergence of vernalization andflowering characteristics among different morphotypes,and variations in thefirst intron of BoFLC3 are involved infine-tuning theflowering process in cauliflower.This study provides a comprehensive understanding of the pan-genome of B.oleracea and sheds light on the domestication and differential organ development of this globally important crop species.展开更多
Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS d...Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS data in Chinese population are limited.In this study,we present a WGS study for 334 individuals,including 112 ASD patients and their non-ASD parents.We identified 146 de novo variants in coding regions in 85 cases and 60 inherited variants in coding regions.By integrating these variants with an association model,we identified 33 potential risk genes(P<0.001)enriched in neuron and regulation related biological process.Besides the well-known ASD genes(SCN2A,NF1,SHANK3,CHD8 etc.),several high confidence genes were highlighted by a series of functional analyses,including CTNND1,DGKZ,LRP1,DDN,ZNF483,NR4A2,SMAD6,INTS1,and MRPL12,with more supported evidence from GO enrichment,expression and network analysis.We also integrated RNA-seq data to analyze the effect of the variants on the gene expression and found 12 genes in the individuals with the related variants had relatively biased expression.We further presented the clinical phenotypes of the proband carrying the risk genes in both our samples and Caucasian samples to show the effect of the risk genes on phenotype.Regarding variants in noncoding regions,a total of 74 de novo variants and 30 inherited variants were predicted as pathogenic with high confidence,which were mapped to specific genes or regulatory features.The number of de novo variants found in patient was significantly associated with the parents’ages at the birth of the child,and gender with trend.We also identified small de novo structural variants in ASD trios.The results in this study provided important evidence for understanding the genetic mechanism of ASD.展开更多
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Science Foundation of China(Grant Nos.31671372,61702406,and 31701739)+3 种基金the Fundamental Research Funds for the Central Universitiesthe World-Class Universities(Disciplines)the Characteristic Development Guidance Funds for the Central Universitiesthe Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)。
文摘Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.
基金supported by the National Natural Science Foundation of China, No.61932008Natural Science Foundation of Shanghai, No.21ZR1403200 (both to JC)。
文摘Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
基金supported by the Sino-German Research Project(GZ 1362)Grains Research and Development Corporation International Visiting Fellowship(UWA2406-010BGX)+1 种基金the National Natural Science Foundation of China(32171982)the Fundamental Research Funds for the Central Universities of the Chinese Government(2662023PY004).
文摘Allopolyploids often exhibit advantages in vigor and adaptability compared to diploids.A long-term goal in the economically important Brassica genus has been to develop a new allohexaploid crop type(AABBCC)by combining different diploid and allotetraploid crop species.However,early-generation allohexaploids often face challenges like unstable meiosis and low fertility,and the phenotypic performance of these synthetic lines has rarely been assessed.This study analyzes agronomic traits,fertility,and genome stability in ArArBcBcCcCc lines derived from four crosses between B.carinata and B.rapa after 9–11 selfing generations.Our results demonstrate polyploid advantage in vigor and seed traits,considerable phenotypic variation,and high fertility and genome stability.Meanwhile,parental genotypes significantly influence outcomes in advanced allohexaploids.Structural variants,largely resulting from A–C homoeologous exchanges,contribute to genomic variation and influence hexaploid genome stability,with the A sub-genome showing the highest variability.Both positive and negative impacts of SVs on fertility and seed weight are observed.Pseudo-euploids,frequently appearing,do not significantly affect fertility or other agronomic traits compared to euploids,indicating a potential pathway toward a stable allohexaploid species.These findings provide insights into the challenge and potential for developing an adaptable and stable Brassica hexaploid through selection.
基金supported by the National Key R&D Program of China (2019YFE0119000)National Natural Science Foundation of China (31872561)+1 种基金National Science Fund for Distinguished Young Scholars (32225049)Alliance of International Science Organizations (ANSO-CR-PP-2021-03)。
文摘Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyprinus carpio wuyuanensis),an indigenous strain in China,is renowned for its unique body morphology and reddish skin.To reveal the genetic basis underlying the distinct skin color of HB,we constructed an improved highfidelity(HiFi) HB genome with good contiguity,completeness,and correctness.Genome structure comparison was conducted between HB and a representative wild strain,Yellow River carp(YR,C.carpio haematopterus),to identify structural variants and genes under positive selection.Signatures of artificial selection during domestication were identified in HB and YR populations,while phenotype mapping was performed in a segregating population generated by HB×YR crosses.Body color in HB was associated with regions with fixed mutations.The simultaneous mutation and superposition of a pair of homologous genes(mitfa) in chromosomes A06 and B06 conferred the reddish color in domesticated HB.Transcriptome analysis of common carp with different alleles of the mitfa mutation confirmed that gene duplication can buffer the deleterious effects of mutation in allotetraploids.This study provides new insights into genotype-phenotype associations in allotetraploid species and lays a foundation for future breeding of common carp.
基金supported by the National Natural Science Foundation of China (31900309)Guangdong Basic and Applied Basic Research Foundation (2019A1515011644)+2 种基金Key-Area Research and Development Program of Guangdong Province (2021B0202020001)Seed Industry Development Project of Agricultural and Rural Department of Guangdong Province (2022)Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)(311021006)。
文摘Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified broad H3K4me3 domains are essential for the regulation of genes involved in several biological processes.However,the role of broad H3K4me3 domains in phenotypic divergence remains poorly understood.Siniperca chuatsi and S.scherzeri are closely related but divergent in several phenotypic traits,making them an ideal model to study cis-regulatory evolution in sister species.Here,we generated chromosome-level genomes of S.chuatsi and S.scherzeri,with assembled genome sizes of 716.35 and740.54 Mb,respectively.The evolutionary histories of S.chuatsi and S.scherzeri were studied by inferring dynamic changes in ancestral population sizes.To explore the genetic basis of adaptation in S.chuatsi and S.scherzeri,we performed gene family expansion and contraction analysis and identified positively selected genes(PSGs).To investigate the role of SVs in cis-regulatory divergence of closely related fish species,we identified high-quality SVs as well as divergent H3K27ac and H3K4me3 domains in the genomes of S.chuatsi and S.scherzeri.Integrated analysis revealed that cis-regulatory divergence caused by SVs played an essential role in phenotypic divergence between S.chuatsi and S.scherzeri.Additionally,divergent broad H3K4me3 domains were mostly associated with cancer-related genes in S.chuatsi and S.scherzeri and contributed to their phenotypic divergence.
基金supported by the Natural Science Foun-dation Project of CQ CSTC (No. 2009BB1241)Ministry of Science and Technology of China (No. 2006AA10A117 and 2005CB121003)
文摘Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete Bombyx mori genome. A representational BmSE element is flanked by an 11 bp target site duplication sequence posterior poly (A) at the 3′ end and has the sequence motifs of an internal promoter of RNA polymerase III, which are similar to that of Bm1. The repetitive elements of BmSE are widely distributed in all 28 chromosomes of the genome and share the common (ATTT) repeats at the ends. GC-content distribution shows that BmSE tends to accumulate preferably in the region of higher AT content than that of Bm1. A high proportion of the BmSEs are mapped to the coding sequence introns, whereas several elements are also present in the UTR of some transcripts, indicating that BmSEs are indeed exonized with UTRs. Of the 615 identified structural variants (SVs) of BmSE among the 40 domesticated and wild silkworms, only 230 SVs were found in the domesticated silkworms, indicating that many recent SV events of BmSE occurred after domestication, which was probably due to its mobilization. Our analysis might assist in developing BmSE as a potential marker and in understanding the evolutionary roles of SINEs in the domesticated silkworm.
基金funded by the National Natural Science Foundation of China(no.32270685)the National Natural Science Fund for Excellent Young Scientists Fund Program(Overseas)the"Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops."
文摘Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.
基金supported by the National Natural Science Foundation of China(31972969)the Science and Technology Department of Yunnan Province(2019FY003015)+4 种基金Research Startup Funding of Yunnan University in China(C176220100033)the Science and Technology Major Project of the Department of Science and Technology of Yunnan(K204204210017)Yunnan Fundamental Research Projects(202301BF070001-026)the Project of Central Guiding Local Technology Development(202407AB110005)the Yunnan Talent Support Plan(C619300A036)。
文摘Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.
基金supported by grants from the National Key Research and Development Program of China (2022YFF1003001)the National Natural Science Foundation of China (32072576)+3 种基金the National Modern Agriculture Industry Technology System (CARS-23-G42)the Jiangsu Provincial Key Research and Development Program (BE2021376)the Innovation Program of the Beijing Academy of Agricultural and Forestry Sciences (KJCX20230121)the Collaborative Innovation Program for Leafy and Root Vegetables of the Beijing Vegetable Research Center,Beijing Academy of Agricultural and Forestry Sciences (XTCX202302).
文摘The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome assemblies of different morphotypes.The pan-genome harbors over 200 structural variant hotspot regions enriched in auxin-andflowering-related genes.Population genomic analyses revealed that early domestication of B.oleracea focused on leaf or stem development.Geneflows resulting from agricultural practices and variety improvement were detected among different morphotypes.Selective-sweep and pan-genome analyses identified an auxin-responsive small auxin up-regulated RNA gene and a CLAV-ATA3/ESR-RELATED family gene as crucial players in leaf–stem differentiation during the early stage of B.oleracea domestication and the BoKAN1 gene as instrumental in shaping the leafy heads of cabbage and Brussels sprouts.Our pan-genome and functional analyses further revealed that variations in the BoFLC2 gene play key roles in the divergence of vernalization andflowering characteristics among different morphotypes,and variations in thefirst intron of BoFLC3 are involved infine-tuning theflowering process in cauliflower.This study provides a comprehensive understanding of the pan-genome of B.oleracea and sheds light on the domestication and differential organ development of this globally important crop species.
基金supported by the National Program for Brain Science and Brain-like Intelligence Technology of China (2021ZD0200800)Beijing Municipal Science and Technology Commission (Z181100001518005)+1 种基金the National Natural Science Foundation of China (31401139, 32170613, 81671358, 81873803)the Natural Science Foundation of Beijing Municipality (7232225)
文摘Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS data in Chinese population are limited.In this study,we present a WGS study for 334 individuals,including 112 ASD patients and their non-ASD parents.We identified 146 de novo variants in coding regions in 85 cases and 60 inherited variants in coding regions.By integrating these variants with an association model,we identified 33 potential risk genes(P<0.001)enriched in neuron and regulation related biological process.Besides the well-known ASD genes(SCN2A,NF1,SHANK3,CHD8 etc.),several high confidence genes were highlighted by a series of functional analyses,including CTNND1,DGKZ,LRP1,DDN,ZNF483,NR4A2,SMAD6,INTS1,and MRPL12,with more supported evidence from GO enrichment,expression and network analysis.We also integrated RNA-seq data to analyze the effect of the variants on the gene expression and found 12 genes in the individuals with the related variants had relatively biased expression.We further presented the clinical phenotypes of the proband carrying the risk genes in both our samples and Caucasian samples to show the effect of the risk genes on phenotype.Regarding variants in noncoding regions,a total of 74 de novo variants and 30 inherited variants were predicted as pathogenic with high confidence,which were mapped to specific genes or regulatory features.The number of de novo variants found in patient was significantly associated with the parents’ages at the birth of the child,and gender with trend.We also identified small de novo structural variants in ASD trios.The results in this study provided important evidence for understanding the genetic mechanism of ASD.