To identify the possible quarantine viruses in seven common sunflower varieties imported from the United States of America and the Netherlands, we tested total RNAs extracted from the leaf tissues using next-generatio...To identify the possible quarantine viruses in seven common sunflower varieties imported from the United States of America and the Netherlands, we tested total RNAs extracted from the leaf tissues using next-generation sequencing of small RNAs. After analysis of small RNA sequencing data, no any quarantine virus was found, but a double-stranded RNA(dsRNA) molecule showing typical genomic features of endornavirus was detected in two varieties, X3939 and SH1108. Full-length sequence and phylogenetic analysis showed that it is a novel endornavirus, temporarily named as Helianthus annuus alphaendornavirus(HaEV). Its full genome corresponds to a 14 662-bp dsRNA segment, including a 21-nt 5′ untranslated region(UTR), 3' UTR ending with the unique sequence CCCCCCCC and lacking a poly(A) tail. An open reading frame(ORF) that encodes a deduced 4 867 amino acids(aa) polyprotein with three domains: RdRP, Hel and UGT(UDP-glycosyltransferase). HaEV mainly distributed in the cytoplasm but less in the nucleus of leaf cells by fluorescence in situ hybridization(FISH) experiment. This virus has a high seed infection rate in the five varieties, X3907, X3939, A231, SH1108 and SR1320. To our knowledge, this is the first report about the virus of the family Endornaviridae in the common sunflower.展开更多
Objective To determine the nosogenetic factors of a 46,XY female with primary amenorrhea and unilateral mixed germ cell tumor.Methods Eight genes associated with 46,XY gonadal dysgenesis were detected in the patient a...Objective To determine the nosogenetic factors of a 46,XY female with primary amenorrhea and unilateral mixed germ cell tumor.Methods Eight genes associated with 46,XY gonadal dysgenesis were detected in the patient and her parents by target region captured-next generation sequencing.Results An insertion of a single nucleotide(adenine) at the coding site 230(c.230231insA) located in the high mobility group(HMG) domain of SRY was revealed,which led to a truncated protein(p.Lys77 fsX 27). This mutation was at position 2655414 of the Y chromosome, supported with 127 unique mapped reads, however, this mutation was not found in the in-house dataset of 1 092 controls. Additionally, none of the candidate gene was detected in the patient’s parents, which indicated that it is a de novo mutation.Conclusion A novel SRY sporadic mutation due to a single nucleotide insertion at position 230(c.230231insA) was identified as the cause of the disease in this patient.Target region captured-next generation sequencing was found to be an effective method for the molecular genetic testing of 46,XY complete gonadal dysgenesis(46,XY CGD).展开更多
Understanding the relationship between genotype and phenotype is a major biological question and being able to predict phenotypes based on molecular genotypes is integral to molecular breeding. Whole- genome duplicati...Understanding the relationship between genotype and phenotype is a major biological question and being able to predict phenotypes based on molecular genotypes is integral to molecular breeding. Whole- genome duplications have shaped the history of all flowering plants and present challenges to elucidating the relationship between genotype and phenotype, especially in neopolyploid species. Although single nucleotide polymorphisms (SNPs) have become popular tools for genetic mapping, discovery and appli- cation of SNPs in polyploids has been difficult. Here, we summarize common experimental approaches to SNP calling, highlighting recent polyploid successes. To examine the impact of software choice on these analyses, we called SNPs among five peanut genotypes using different alignment programs (BWA-mem and Bowtie 2) and variant callers (SAMtools, GATK, and Freebayes). Alignments produced by Bowtie 2 and BWA-mem and analyzed in SAMtools shared 24.5% concordant SNPs, and SAMtools, GATK, and Freebayes shared 1.4% concordant SNPs. A subsequent analysis of simulated Brassica napus chromosome 1A and 1C genotypes demonstrated that, of the three software programs, SAMtools performed with the highest sensitivity and specificity on Bowtie 2 alignments. These results, however, are likely to vary among species, and we therefore propose a series of best practices for SNP calling in polyploids.展开更多
Rice blast caused by Magnaporthe oryzae (M. oryzae) is one of the most destructive diseases, which causes significant rice yield losses and affects global food security. To better understand genetic variations among...Rice blast caused by Magnaporthe oryzae (M. oryzae) is one of the most destructive diseases, which causes significant rice yield losses and affects global food security. To better understand genetic variations among different isolates of M. oryzae in nature, we re-sequenced the genomes of two field isolates, CH43 and Zhong-10-8-14, which showed distinct pathogenecity on most of the rice cultivars. Genome-wide genetic variation analysis reveals that ZHONG-10-8-14 exhibits higher sequence variations than CH43. Structural variations (SVs) detection shows that the sequence variations primarily occur in exons and intergenic regions. Bioinformatics analysis for gene variations reveals that many pathogenecity-related pathways are enriched. In addition, 193 candidate effectors with various DNA polymorphisms were identified, including two known effectors AVR-Pik and AVR-Pital. Comparative polymorphism analysis of thirteen randomly selected effectors suggests that the genetic variations of effectors are under positive selection. The expression pattern analysis of several pathogenecity-related variant genes indicates that these genes are differentially regulated in two isolates, with much higher expression levels in Zhong-10-8-14 than CH43. Our data demonstrate that the genetic variations of effectors and pathogenecity-related genes are under positive selection, resulting in the distinct pathogeuicities of CH43 and Zhong- 10-8-14 on rice.展开更多
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unkn...Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.展开更多
Apis mellifera syriaca exhibits a high degree of tolerance to pests and pathogens including varroa mites. This native honey bee subspecies of Jordan expresses behavioral adaptations to high temperature and dry seasons...Apis mellifera syriaca exhibits a high degree of tolerance to pests and pathogens including varroa mites. This native honey bee subspecies of Jordan expresses behavioral adaptations to high temperature and dry seasons typical of the region. However, persistent honey bee imports of commercial breeder lines are endangering local honey bee population. This study reports the use of next-generation sequencing (NGS) technology to study the A. m. syriaca genome and to identify genetic factors possibly contributing toward mite resistance and other favorable traits. We obtained a total of 46.2 million raw reads by applying the NGS to sequence A. m. syriaca and used extensive bioinformatics approach to identify several candidate genes for Varroa mite resistance, behavioral and immune responses char- acteristic for these bees. As a part of characterizing the functional regulation of molecular genetic pathway, we have mapped the pathway genes potentially involved using information from Drosophila melanogaster and present possible functional changes implicated in responses to Varroa destructor mite infestation toward this. We performed in-depth functional annotation methods to identify -600 candidates that are relevant, genes involved in pathways such as microbial recognition and phagocytosis, peptidoglycan recognition protein family, Gram negative binding protein family, phagocytosis receptors, serpins, Toll signaling pathway, Imd pathway, Tnf, JAK-STAT and MAPK pathway, heamatopioesis and cellular response pathways, antiviral, RNAi pathway, stress factors, etc. were selected. Finally, we have cataloged function-specific polymorphisms between A. mellifera and A. m. syriaca that could give better understanding of varroa mite resistance mechanisms and assist in breeding. We have identified immune related embryonic development (Cactus, Relish, dorsal, Ank2, baz), Varroa hygiene (NorpA2, Zasp, LanA, gasp, impl3) and Varroa resistance (Pug, pcmt, elk, elf3-s10, Dscam2, Dhc64C, gro, futsch) functional variations genes between A. mellifera and A. m. syriaca that could be used to develop an effective molecular tool for bee conservation and breeding programs to improve locally adapted strains such as syriaca and utilize their advantageous traits for the benefit of apiculture industry.展开更多
The next generation sequencing (NGS) is an important process which assures inexpen- sive organization of vast size of raw sequence dataset over any traditional sequencing systems or methods. Various aspects of NGS s...The next generation sequencing (NGS) is an important process which assures inexpen- sive organization of vast size of raw sequence dataset over any traditional sequencing systems or methods. Various aspects of NGS such as template preparation, sequencing imaging and genome alignment and assembly outline the genome sequencing and align- ment. Consequently, de Bruijn graph (dBG) is an important mathematical tool that graphically analyzes how the orientations are constructed in groups of nucleotides. Basi- cally, dBG describes the formation of the genome segments in circular iterative fashions. Some pivotal dBG-based de novo algorithms and software packages such as T-IDBA, Oases, IDBA-tran, Euler, Velvet, ABYSS, AllPaths, SOAPde novo and SOAPde novo2 are illustrated in this paper. Consequently, overlap layout consensus (OLC) graph-based algorithms also play vital role in NGS assembly. Some important OLC-based algorithms such as MIRA3, CABOG, Newbler, Edena, Mosaik and SHORTY are portrayed in this paper. It has been experimented that greedy graph-based algorithms and software pack- ages are also vital for proper genome dataset assembly. A few algorithms named SSAKE, SHARCGS and VCAKE help to perform proper genome sequencing.展开更多
Transcriptomics is one of the most developed fields in the post-genomic era.Transcriptome is the complete set of RNA transcripts in a specific cell type or tissue at a certain developmental stage and/or under a specif...Transcriptomics is one of the most developed fields in the post-genomic era.Transcriptome is the complete set of RNA transcripts in a specific cell type or tissue at a certain developmental stage and/or under a specific physiological condition,including messenger RNA,transfer RNA,ribosomal RNA,and other non-coding RNAs.Transcriptomics focuses on the gene expression at the RNA level and offers the genome-wide information of gene structure and gene function in order to reveal the molecular mechanisms involved in specific biological processes.With the development of next-generation high-throughput sequencing technology,transcriptome analysis has been progressively improving our understanding of RNA-based gene regulatory network.Here,we discuss the concept,history,and especially the recent advances in this inspiring field of study.展开更多
Personalized medicine aims to utilize genomic information about patients to tailor treatment. Gene replacement therapy for ra- re genetic disorders is perhaps the most extreme form of personalized medicine, in that th...Personalized medicine aims to utilize genomic information about patients to tailor treatment. Gene replacement therapy for ra- re genetic disorders is perhaps the most extreme form of personalized medicine, in that the patients' genome wholly determines their treatment regimen. Gene therapy for retinal disorders is poised to become a clinical reality. The eye is an optimal site for gene therapy due to the relative ease of precise vector delivery, immune system isolation, and availability for monitoring of any potential damage or side effects. Due to these advantages, clinical trials for gene therapy of retinal diseases are currently underway. A necessary precursor to such gene therapies is accurate molecular diagnosis of the mutation(s) underlying disease. In this review, we discuss the application of Next Generation Sequencing (NGS) to obtain such a diagnosis and identify disease causing genes, using retinal disorders as a case study. After reviewing ocular gene therapy, we discuss the application of NGS to the identification of novel Mendelian disease genes. We then compare current, array based mutation detection methods against next NGS-based methods in three retinal diseases: Leber's Congenital Amaurosis, Retinitis Pigmentosa, and Stargardt's disease. We conclude that next-generation sequencing based diagnosis offers several advantages over array based methods, including a higher rate of successful diagnosis and the ability to more deeply and efficiently assay a broad spectrum of mutations. However, the relative difficulty of interpreting sequence results and the development of standardized, reliable bioinforrnatic tools remain outstanding concerns. In this review, recent advances NGS based molecular diagnoses are discussed, as well as their implications for the development of personalized medicine.展开更多
基金supported by the Inter-Governmental S&T Cooperation Proposal between China and Czech Republic (2016YFE0131000)the Beijng Nova Program, China (Z171100001117036)
文摘To identify the possible quarantine viruses in seven common sunflower varieties imported from the United States of America and the Netherlands, we tested total RNAs extracted from the leaf tissues using next-generation sequencing of small RNAs. After analysis of small RNA sequencing data, no any quarantine virus was found, but a double-stranded RNA(dsRNA) molecule showing typical genomic features of endornavirus was detected in two varieties, X3939 and SH1108. Full-length sequence and phylogenetic analysis showed that it is a novel endornavirus, temporarily named as Helianthus annuus alphaendornavirus(HaEV). Its full genome corresponds to a 14 662-bp dsRNA segment, including a 21-nt 5′ untranslated region(UTR), 3' UTR ending with the unique sequence CCCCCCCC and lacking a poly(A) tail. An open reading frame(ORF) that encodes a deduced 4 867 amino acids(aa) polyprotein with three domains: RdRP, Hel and UGT(UDP-glycosyltransferase). HaEV mainly distributed in the cytoplasm but less in the nucleus of leaf cells by fluorescence in situ hybridization(FISH) experiment. This virus has a high seed infection rate in the five varieties, X3907, X3939, A231, SH1108 and SR1320. To our knowledge, this is the first report about the virus of the family Endornaviridae in the common sunflower.
基金supported by grants of the Tianjin Binhai New Area Science and Technology Commission(No.2011-BK120011)Shenzhen Engineering Laboratory for Clinical Molecular Diagnostic,the Shenzhen Municipal Government of China(No.CXZZ20130517144604091)and China National GeneB ank-Shenzhen
文摘Objective To determine the nosogenetic factors of a 46,XY female with primary amenorrhea and unilateral mixed germ cell tumor.Methods Eight genes associated with 46,XY gonadal dysgenesis were detected in the patient and her parents by target region captured-next generation sequencing.Results An insertion of a single nucleotide(adenine) at the coding site 230(c.230231insA) located in the high mobility group(HMG) domain of SRY was revealed,which led to a truncated protein(p.Lys77 fsX 27). This mutation was at position 2655414 of the Y chromosome, supported with 127 unique mapped reads, however, this mutation was not found in the in-house dataset of 1 092 controls. Additionally, none of the candidate gene was detected in the patient’s parents, which indicated that it is a de novo mutation.Conclusion A novel SRY sporadic mutation due to a single nucleotide insertion at position 230(c.230231insA) was identified as the cause of the disease in this patient.Target region captured-next generation sequencing was found to be an effective method for the molecular genetic testing of 46,XY complete gonadal dysgenesis(46,XY CGD).
文摘Understanding the relationship between genotype and phenotype is a major biological question and being able to predict phenotypes based on molecular genotypes is integral to molecular breeding. Whole- genome duplications have shaped the history of all flowering plants and present challenges to elucidating the relationship between genotype and phenotype, especially in neopolyploid species. Although single nucleotide polymorphisms (SNPs) have become popular tools for genetic mapping, discovery and appli- cation of SNPs in polyploids has been difficult. Here, we summarize common experimental approaches to SNP calling, highlighting recent polyploid successes. To examine the impact of software choice on these analyses, we called SNPs among five peanut genotypes using different alignment programs (BWA-mem and Bowtie 2) and variant callers (SAMtools, GATK, and Freebayes). Alignments produced by Bowtie 2 and BWA-mem and analyzed in SAMtools shared 24.5% concordant SNPs, and SAMtools, GATK, and Freebayes shared 1.4% concordant SNPs. A subsequent analysis of simulated Brassica napus chromosome 1A and 1C genotypes demonstrated that, of the three software programs, SAMtools performed with the highest sensitivity and specificity on Bowtie 2 alignments. These results, however, are likely to vary among species, and we therefore propose a series of best practices for SNP calling in polyploids.
基金supported by the Chinese Academy of Sciences(Strategic Priority Research Program XDB11020300)National Natural Science Foundation of China(31570252,31601629)+1 种基金the start-up fund of"One Hundred Talents"program of the Chinese Academy of Sciences and by the grants from the State Key Laboratory of Plant Genomics(O8KF021011)the Key Laboratory of Urban Agriculture(North)of Ministry of Agriculture of China Beijing University of Agriculture(KFK2015001)
文摘Rice blast caused by Magnaporthe oryzae (M. oryzae) is one of the most destructive diseases, which causes significant rice yield losses and affects global food security. To better understand genetic variations among different isolates of M. oryzae in nature, we re-sequenced the genomes of two field isolates, CH43 and Zhong-10-8-14, which showed distinct pathogenecity on most of the rice cultivars. Genome-wide genetic variation analysis reveals that ZHONG-10-8-14 exhibits higher sequence variations than CH43. Structural variations (SVs) detection shows that the sequence variations primarily occur in exons and intergenic regions. Bioinformatics analysis for gene variations reveals that many pathogenecity-related pathways are enriched. In addition, 193 candidate effectors with various DNA polymorphisms were identified, including two known effectors AVR-Pik and AVR-Pital. Comparative polymorphism analysis of thirteen randomly selected effectors suggests that the genetic variations of effectors are under positive selection. The expression pattern analysis of several pathogenecity-related variant genes indicates that these genes are differentially regulated in two isolates, with much higher expression levels in Zhong-10-8-14 than CH43. Our data demonstrate that the genetic variations of effectors and pathogenecity-related genes are under positive selection, resulting in the distinct pathogeuicities of CH43 and Zhong- 10-8-14 on rice.
基金supported by NIH under Grant No. 1R01HG004908-01NSF of USA under Grant No. DBI-0845685 (YY)the Gordon and Betty Moore Foundation for the Community Cyberinfrastructure for Marine Microbial Ecological Research and Analysis (CAMERA) Project (JW)
文摘Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.
文摘Apis mellifera syriaca exhibits a high degree of tolerance to pests and pathogens including varroa mites. This native honey bee subspecies of Jordan expresses behavioral adaptations to high temperature and dry seasons typical of the region. However, persistent honey bee imports of commercial breeder lines are endangering local honey bee population. This study reports the use of next-generation sequencing (NGS) technology to study the A. m. syriaca genome and to identify genetic factors possibly contributing toward mite resistance and other favorable traits. We obtained a total of 46.2 million raw reads by applying the NGS to sequence A. m. syriaca and used extensive bioinformatics approach to identify several candidate genes for Varroa mite resistance, behavioral and immune responses char- acteristic for these bees. As a part of characterizing the functional regulation of molecular genetic pathway, we have mapped the pathway genes potentially involved using information from Drosophila melanogaster and present possible functional changes implicated in responses to Varroa destructor mite infestation toward this. We performed in-depth functional annotation methods to identify -600 candidates that are relevant, genes involved in pathways such as microbial recognition and phagocytosis, peptidoglycan recognition protein family, Gram negative binding protein family, phagocytosis receptors, serpins, Toll signaling pathway, Imd pathway, Tnf, JAK-STAT and MAPK pathway, heamatopioesis and cellular response pathways, antiviral, RNAi pathway, stress factors, etc. were selected. Finally, we have cataloged function-specific polymorphisms between A. mellifera and A. m. syriaca that could give better understanding of varroa mite resistance mechanisms and assist in breeding. We have identified immune related embryonic development (Cactus, Relish, dorsal, Ank2, baz), Varroa hygiene (NorpA2, Zasp, LanA, gasp, impl3) and Varroa resistance (Pug, pcmt, elk, elf3-s10, Dscam2, Dhc64C, gro, futsch) functional variations genes between A. mellifera and A. m. syriaca that could be used to develop an effective molecular tool for bee conservation and breeding programs to improve locally adapted strains such as syriaca and utilize their advantageous traits for the benefit of apiculture industry.
文摘The next generation sequencing (NGS) is an important process which assures inexpen- sive organization of vast size of raw sequence dataset over any traditional sequencing systems or methods. Various aspects of NGS such as template preparation, sequencing imaging and genome alignment and assembly outline the genome sequencing and align- ment. Consequently, de Bruijn graph (dBG) is an important mathematical tool that graphically analyzes how the orientations are constructed in groups of nucleotides. Basi- cally, dBG describes the formation of the genome segments in circular iterative fashions. Some pivotal dBG-based de novo algorithms and software packages such as T-IDBA, Oases, IDBA-tran, Euler, Velvet, ABYSS, AllPaths, SOAPde novo and SOAPde novo2 are illustrated in this paper. Consequently, overlap layout consensus (OLC) graph-based algorithms also play vital role in NGS assembly. Some important OLC-based algorithms such as MIRA3, CABOG, Newbler, Edena, Mosaik and SHORTY are portrayed in this paper. It has been experimented that greedy graph-based algorithms and software pack- ages are also vital for proper genome dataset assembly. A few algorithms named SSAKE, SHARCGS and VCAKE help to perform proper genome sequencing.
基金supported by grants from the National Natural Science Foundation of China(31271318)Natural Science Foundation of Guangdong(S2012010008912)Foundation of Key Laboratory of Plant Resources Conservation and Sustainable Utilization,South China Botanical Garden,Chinese Academy of Sciences
文摘Transcriptomics is one of the most developed fields in the post-genomic era.Transcriptome is the complete set of RNA transcripts in a specific cell type or tissue at a certain developmental stage and/or under a specific physiological condition,including messenger RNA,transfer RNA,ribosomal RNA,and other non-coding RNAs.Transcriptomics focuses on the gene expression at the RNA level and offers the genome-wide information of gene structure and gene function in order to reveal the molecular mechanisms involved in specific biological processes.With the development of next-generation high-throughput sequencing technology,transcriptome analysis has been progressively improving our understanding of RNA-based gene regulatory network.Here,we discuss the concept,history,and especially the recent advances in this inspiring field of study.
基金Jacques Zaneveld is supported by NIH training grant T32 EY007102Chen Rui is supported by grants from the Retinal Research Foundation and National Eye Institute (R01EY018571,R01EY022356)
文摘Personalized medicine aims to utilize genomic information about patients to tailor treatment. Gene replacement therapy for ra- re genetic disorders is perhaps the most extreme form of personalized medicine, in that the patients' genome wholly determines their treatment regimen. Gene therapy for retinal disorders is poised to become a clinical reality. The eye is an optimal site for gene therapy due to the relative ease of precise vector delivery, immune system isolation, and availability for monitoring of any potential damage or side effects. Due to these advantages, clinical trials for gene therapy of retinal diseases are currently underway. A necessary precursor to such gene therapies is accurate molecular diagnosis of the mutation(s) underlying disease. In this review, we discuss the application of Next Generation Sequencing (NGS) to obtain such a diagnosis and identify disease causing genes, using retinal disorders as a case study. After reviewing ocular gene therapy, we discuss the application of NGS to the identification of novel Mendelian disease genes. We then compare current, array based mutation detection methods against next NGS-based methods in three retinal diseases: Leber's Congenital Amaurosis, Retinitis Pigmentosa, and Stargardt's disease. We conclude that next-generation sequencing based diagnosis offers several advantages over array based methods, including a higher rate of successful diagnosis and the ability to more deeply and efficiently assay a broad spectrum of mutations. However, the relative difficulty of interpreting sequence results and the development of standardized, reliable bioinforrnatic tools remain outstanding concerns. In this review, recent advances NGS based molecular diagnoses are discussed, as well as their implications for the development of personalized medicine.