Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A hig...Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A high-quality reference genome for apricot is still unavailable,which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background,evolutionary diversity,and population diversity in apricot.DNA from P.armeniaca was used to generate a standard,size-selected library with an average DNA fragment size of~20 kb.The library was run on Sequel SMRT Cells,generating a total of 16.54 Gb of PacBio subreads(N50=13.55 kb).The high-quality P.armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70×coverage and 171×Illumina reads(40.46 Gb),combined with a genetic map for chromosome scaffolding.The assembled genome size was 221.9 Mb,with a contig NG50 size of 1.02 Mb.Scaffolds covering 92.88%of the assembled genome were anchored on eight chromosomes.Benchmarking Universal Single-Copy Orthologs analysis showed 98.0%complete genes.We predicted 30,436 protein-coding genes,and 38.28%of the genome was predicted to be repetitive.We found 981 contracted gene families,1324 expanded gene families and 2300 apricot-specific genes.The differentially expressed gene(DEG)analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase(NCED)gene but not lycopene beta-cyclase(LcyB)gene results in a lowβ-carotenoid content in the white cultivar“Dabaixing”.This complete and highly contiguous P.armeniaca reference genome will be of help for future studies of resistance to plum pox virus(PPV)and the identification and characterization of important agronomic genes and breeding strategies in apricot.展开更多
Pea(Pisum sativum L.)is an annual cool-season legume crop.Owing to its role in sustainable agriculture as both a rotation and a cash crop,its global market is expanding and increased production is urgently needed.For ...Pea(Pisum sativum L.)is an annual cool-season legume crop.Owing to its role in sustainable agriculture as both a rotation and a cash crop,its global market is expanding and increased production is urgently needed.For both technical and regulatory reasons,neither conventional nor transgenic breeding techniques can keep pace with the demand for increased production.In answer to this challenge,CRISPR/Cas9 genome editing technology has been gaining traction in plant biology and crop breeding in recent years.However,there are currently no reports of the successful application of the CRISPR/Cas9 genome editing technology in pea.We developed a transient transformation system of hairy roots,mediated by Agrobacterium rhizogenes strain K599,to validate the efficiency of a CRISPR/Cas9 system.Further optimization resulted in an efficient vector,PsU6.3-tRNA-PsPDS3-en35S-PsCas9.We used this optimized CRISPR/Cas9 system to edit the pea phytoene desaturase(PsPDS)gene,causing albinism,by Agrobacterium-mediated genetic transformation.This is the first report of successful generation of gene-edited pea plants by this route.展开更多
The mpox virus(MPXV)is undergoing mutations at an alarmingly rapid pace,necessitating heightened genomic surveil-lance to manage its global spread.However,current assessments lack a comprehensive evaluation of genomic...The mpox virus(MPXV)is undergoing mutations at an alarmingly rapid pace,necessitating heightened genomic surveil-lance to manage its global spread.However,current assessments lack a comprehensive evaluation of genomic varia-tions and the influence of environmental and social factors.To address this gap,we developed the mpox virus variations risk evaluation system(VarEPS-MPXV),which uses a multidimensional strategy to assess observed and virtual varia-tions-those that have yet to occur-thereby mitigating time-lag issues in risk prediction.The system integrates six environmental and four social factors to monitor their impact on genomic variation.By analyzing 17,523 publicly avail-able MPxV sequences,we identified 61,788 unique amino acid variants and highlighted five significant mutations.Notably,OPG118:K606E is predicted to play a critical role in MPXV survival and transmission.Our assessment revealed that most key mutations involved amino acid substitutions with low mutational bariers.Variations in the OPG190 gene may alter antibody affinity,while the mutation at site 127 in the OPG038 gene may influence immune protein binding sta-bility.The VarEPS-MPXV offers vital support for managing MPXV outbreaks and other viral diseases,contributing to global public health research and practice.Researchers can freely access the database at https://nmdc.cn/mpox/.展开更多
Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalle...Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalleled reservoir of microbial genetic diversity.Here,by analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types,we present a comprehensive Cold Seep Microbiomic Database(CSMD)to archive the genomic and functional diversity of cold seep microbiomes.The CSMD includes over 49 million non-redundant genes and 3175 metagenome-assembled genomes,which represent 1895 species spanning 105 phyla.In addition,beta diversity analysis indicates that both the sampling site and cold seep type have a substantial impact on the prokaryotic microbiome community composition.Heterotrophic and anaerobic metabolisms are prevalent in microbial communities,accompanied by considerable mixotrophs and facultative anaerobes,highlighting the versatile metabolic potential in cold seeps.Furthermore,secondary metabolic gene cluster analysis indicates that at least 98.81%of the sequences potentially encode novel natural products,with ribosomally synthesized and post-translationally modified peptides being the predominant type widely distributed in archaea and bacteria.Overall,the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.展开更多
Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HIL...Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HILs) are planted in more than 60% of maize fields in China. Proteome clustering of six completed sequeneed maize genomes show that 638 proteins fall into 264 HZS-specific gene families with the majority of contributions from tandem duplication events. Resequencing and comparative analysis of 40 HZSrelated lines reveals the breeding history of HILs. More than 60% of identified selective sweeps were clustered in identity.by.descent conserved regions, and yield-related genes/QTLs were enriched in HZS characteristic selected regions. Furthermore, we dem on strated that HZS-specific family genes were not uniformly distributed in the genome but enriched in improvement/function.related genomic regions. This study provides an important and novel resource for maize genome research and expands our knowledge on the breadth of genomic variation and improvement history of maize.展开更多
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provi...Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provides great opportunities for gene model refinement,few efforts have been made to adopt these precious data in rice genome reannotation.Here we reannotate the rice(Oryza sativa L.ssp.japonica)genome based on integration of large-scale RNA-seq data and release a new annotation system IC4 R-2.0.In general,IC4 R-2.0 significantly improves the completeness of gene structure,identifies a number of novel genes,and integrates a variety of functional annotations.Furthermore,long non-coding RNAs(lncRNAs)and circular RNAs(circRNAs)are systematically characterized in the rice genome.Performance evaluation shows that compared to previous annotation systems,IC4 R-2.0 achieves higher integrity and quality,primarily attributable to massive RNA-seq data applied in genome annotation.Consequently,we incorporate the improved annotations into the Information Commons for Rice(IC4 R),a database integrating multiple omics data of rice,and accordingly update IC4 R by providing more user-friendly web interfaces and implementing a series of practical online tools.Together,the updated IC4 R,which is equipped with the improved annotations,bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species.The IC4 R-2.0 annotation system and related resources are freely accessible at http://ic4 r.org/.展开更多
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the ...The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Al- though there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn't taken into account the sequencing errors when dealing with the du- plicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.展开更多
In the bone marrow and spleen,the developing B cell populations undergo both negative and positive selections to shape their B cell receptor repertoire.To gain insight into the shift of the immunoglobulin heavy(IgH)ch...In the bone marrow and spleen,the developing B cell populations undergo both negative and positive selections to shape their B cell receptor repertoire.To gain insight into the shift of the immunoglobulin heavy(IgH)chain repertoire during B cell development,we undertook large scale Igμchain repertoire analysis of pre-B,immature B and spleen B cell populations.We found that the majority of VH gene segments,VH families,JH and D gene segments,were observed to have significantly different usage frequencies when three B cell populations were compared,but the usage profile of the VH,D,and JH genes between different B cell populations showed high correlations.In both productive and nonproductive rearrangements,the length of CDRH3 shortened significantly on average when B cells entered the periphery.However,the CDRH3 length distribution of nonproductive rearrangements did not follow a Gaussian distribution,but decreased successively in the order 3n–2,3n–1 and 3n,suggesting a direct correlation between mRNA stability and CDRH3 length patterns of nonproductive rearrangements.Further analysis of the individual components comprising CDRH3 of productive rearrangements indicated that the decrease in CDRH3 length was largely due to the reduction of N addition at the 5′and 3′junctions.Moreover,with development,the amino acid content of CDRH3 progressed toward fewer positively charged and nonpolar residues but more polar residues.All these data indicated that the expressed Igμchain repertoire,especially the repertoire of CDRH3,was fine-tuned when B cells passed through several checkpoints of selection during the process of maturation.展开更多
基金supported by the research of the National Key R&D Program of China(2018YFD1000606-4)the Beijing Academy of Agriculture and Forestry Fund for Young Scholars(QNJJ201702,QNJJ201925)+1 种基金the National Natural Science Foundation of China(31401836)the Municipal Natural Science Foundation of Beijing(6162012).
文摘Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A high-quality reference genome for apricot is still unavailable,which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background,evolutionary diversity,and population diversity in apricot.DNA from P.armeniaca was used to generate a standard,size-selected library with an average DNA fragment size of~20 kb.The library was run on Sequel SMRT Cells,generating a total of 16.54 Gb of PacBio subreads(N50=13.55 kb).The high-quality P.armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70×coverage and 171×Illumina reads(40.46 Gb),combined with a genetic map for chromosome scaffolding.The assembled genome size was 221.9 Mb,with a contig NG50 size of 1.02 Mb.Scaffolds covering 92.88%of the assembled genome were anchored on eight chromosomes.Benchmarking Universal Single-Copy Orthologs analysis showed 98.0%complete genes.We predicted 30,436 protein-coding genes,and 38.28%of the genome was predicted to be repetitive.We found 981 contracted gene families,1324 expanded gene families and 2300 apricot-specific genes.The differentially expressed gene(DEG)analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase(NCED)gene but not lycopene beta-cyclase(LcyB)gene results in a lowβ-carotenoid content in the white cultivar“Dabaixing”.This complete and highly contiguous P.armeniaca reference genome will be of help for future studies of resistance to plum pox virus(PPV)and the identification and characterization of important agronomic genes and breeding strategies in apricot.
基金the financial support of the China Agriculture Research System of MOF and MARA-Food Legumes(CARS-08)the Agricultural Science and Technology Innovation Program(ASTIP)of the Chinese Academy of Agricultural Sciences。
文摘Pea(Pisum sativum L.)is an annual cool-season legume crop.Owing to its role in sustainable agriculture as both a rotation and a cash crop,its global market is expanding and increased production is urgently needed.For both technical and regulatory reasons,neither conventional nor transgenic breeding techniques can keep pace with the demand for increased production.In answer to this challenge,CRISPR/Cas9 genome editing technology has been gaining traction in plant biology and crop breeding in recent years.However,there are currently no reports of the successful application of the CRISPR/Cas9 genome editing technology in pea.We developed a transient transformation system of hairy roots,mediated by Agrobacterium rhizogenes strain K599,to validate the efficiency of a CRISPR/Cas9 system.Further optimization resulted in an efficient vector,PsU6.3-tRNA-PsPDS3-en35S-PsCas9.We used this optimized CRISPR/Cas9 system to edit the pea phytoene desaturase(PsPDS)gene,causing albinism,by Agrobacterium-mediated genetic transformation.This is the first report of successful generation of gene-edited pea plants by this route.
基金work was supported by various funding sources,including the National Key Research Program of China(2022YFC2303400 to C.S.)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB0830000 to L.W.)+1 种基金the Key Research Programof the Chi-nese Academy of Sciences(KFZD-SW-219 to L.W.)the Shenzhen Medical Research Fund(E24010010 to Z.L.and E24010011 to Z.L.).
文摘The mpox virus(MPXV)is undergoing mutations at an alarmingly rapid pace,necessitating heightened genomic surveil-lance to manage its global spread.However,current assessments lack a comprehensive evaluation of genomic varia-tions and the influence of environmental and social factors.To address this gap,we developed the mpox virus variations risk evaluation system(VarEPS-MPXV),which uses a multidimensional strategy to assess observed and virtual varia-tions-those that have yet to occur-thereby mitigating time-lag issues in risk prediction.The system integrates six environmental and four social factors to monitor their impact on genomic variation.By analyzing 17,523 publicly avail-able MPxV sequences,we identified 61,788 unique amino acid variants and highlighted five significant mutations.Notably,OPG118:K606E is predicted to play a critical role in MPXV survival and transmission.Our assessment revealed that most key mutations involved amino acid substitutions with low mutational bariers.Variations in the OPG190 gene may alter antibody affinity,while the mutation at site 127 in the OPG038 gene may influence immune protein binding sta-bility.The VarEPS-MPXV offers vital support for managing MPXV outbreaks and other viral diseases,contributing to global public health research and practice.Researchers can freely access the database at https://nmdc.cn/mpox/.
基金support from the Senior User Project of RV KEXUE(Grant No.KEXUE2019GZ05)the Center for Ocean Mega-Science,Chinese Academy of Sciences+2 种基金funding support from the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2021QZKK0100)the National Key R&D Program of China(Grant No.2022YFF1002801)the National Natural Science Foundation of China(Grant No.92251302).
文摘Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalleled reservoir of microbial genetic diversity.Here,by analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types,we present a comprehensive Cold Seep Microbiomic Database(CSMD)to archive the genomic and functional diversity of cold seep microbiomes.The CSMD includes over 49 million non-redundant genes and 3175 metagenome-assembled genomes,which represent 1895 species spanning 105 phyla.In addition,beta diversity analysis indicates that both the sampling site and cold seep type have a substantial impact on the prokaryotic microbiome community composition.Heterotrophic and anaerobic metabolisms are prevalent in microbial communities,accompanied by considerable mixotrophs and facultative anaerobes,highlighting the versatile metabolic potential in cold seeps.Furthermore,secondary metabolic gene cluster analysis indicates that at least 98.81%of the sequences potentially encode novel natural products,with ribosomally synthesized and post-translationally modified peptides being the predominant type widely distributed in archaea and bacteria.Overall,the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.
文摘Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HILs) are planted in more than 60% of maize fields in China. Proteome clustering of six completed sequeneed maize genomes show that 638 proteins fall into 264 HZS-specific gene families with the majority of contributions from tandem duplication events. Resequencing and comparative analysis of 40 HZSrelated lines reveals the breeding history of HILs. More than 60% of identified selective sweeps were clustered in identity.by.descent conserved regions, and yield-related genes/QTLs were enriched in HZS characteristic selected regions. Furthermore, we dem on strated that HZS-specific family genes were not uniformly distributed in the genome but enriched in improvement/function.related genomic regions. This study provides an important and novel resource for maize genome research and expands our knowledge on the breadth of genomic variation and improvement history of maize.
基金supported by grants from the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA08020102 to ZZ and SH)the Youth Innovation Promotion Association of Chinese Academy of Science(Grant No.2018134 to LH)+2 种基金National Programs for High TechnologyResearch and Development(Grant Nos.2015AA020108 and 2012AA020409 to ZZ)the 100-Talent Program of Chinese Academy of Sciences(to YB and ZZ)the National Natural Science Foundation of China(Grant No.31100915 to LH)
文摘Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provides great opportunities for gene model refinement,few efforts have been made to adopt these precious data in rice genome reannotation.Here we reannotate the rice(Oryza sativa L.ssp.japonica)genome based on integration of large-scale RNA-seq data and release a new annotation system IC4 R-2.0.In general,IC4 R-2.0 significantly improves the completeness of gene structure,identifies a number of novel genes,and integrates a variety of functional annotations.Furthermore,long non-coding RNAs(lncRNAs)and circular RNAs(circRNAs)are systematically characterized in the rice genome.Performance evaluation shows that compared to previous annotation systems,IC4 R-2.0 achieves higher integrity and quality,primarily attributable to massive RNA-seq data applied in genome annotation.Consequently,we incorporate the improved annotations into the Information Commons for Rice(IC4 R),a database integrating multiple omics data of rice,and accordingly update IC4 R by providing more user-friendly web interfaces and implementing a series of practical online tools.Together,the updated IC4 R,which is equipped with the improved annotations,bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species.The IC4 R-2.0 annotation system and related resources are freely accessible at http://ic4 r.org/.
基金supported by the National Natural Science Foundation of China (Grant No.31000561 and 30900825)the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No.KSCX2-EW-R-01-04)
文摘The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Al- though there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn't taken into account the sequencing errors when dealing with the du- plicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.
基金This work was supported by the National Basic Research Program of China(2010CB945300).
文摘In the bone marrow and spleen,the developing B cell populations undergo both negative and positive selections to shape their B cell receptor repertoire.To gain insight into the shift of the immunoglobulin heavy(IgH)chain repertoire during B cell development,we undertook large scale Igμchain repertoire analysis of pre-B,immature B and spleen B cell populations.We found that the majority of VH gene segments,VH families,JH and D gene segments,were observed to have significantly different usage frequencies when three B cell populations were compared,but the usage profile of the VH,D,and JH genes between different B cell populations showed high correlations.In both productive and nonproductive rearrangements,the length of CDRH3 shortened significantly on average when B cells entered the periphery.However,the CDRH3 length distribution of nonproductive rearrangements did not follow a Gaussian distribution,but decreased successively in the order 3n–2,3n–1 and 3n,suggesting a direct correlation between mRNA stability and CDRH3 length patterns of nonproductive rearrangements.Further analysis of the individual components comprising CDRH3 of productive rearrangements indicated that the decrease in CDRH3 length was largely due to the reduction of N addition at the 5′and 3′junctions.Moreover,with development,the amino acid content of CDRH3 progressed toward fewer positively charged and nonpolar residues but more polar residues.All these data indicated that the expressed Igμchain repertoire,especially the repertoire of CDRH3,was fine-tuned when B cells passed through several checkpoints of selection during the process of maturation.