The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de nov...The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.展开更多
De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carri...De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.展开更多
A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced t...A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript as- sembly contigs (TACs) with an N50 of 1510 bp, the largest one being -8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping posi- tions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.展开更多
Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development period...Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.展开更多
Background: Precision medicine approach holds great promise to tailored diagnosis, treatment and prevention. Individuals can be vastly different in their genomic information and genetic mechanisms hence having unique...Background: Precision medicine approach holds great promise to tailored diagnosis, treatment and prevention. Individuals can be vastly different in their genomic information and genetic mechanisms hence having unique transcriptomic signatures. The development of precision medicine has demanded moving beyond DNA sequencing (DNA-Seq) to much more pointed RNA-sequencing (R_NA-Seq) [Cell, 2017, 168: 584--599]. Results: Here we conduct a brief survey on the recent methodology development of transcriptome assembly approach using RNA-Seq. Conclusions: Since transcriptomes in human disease are highly complex, dynamic and diverse, transcriptome assembly is playing an increasingly important role in precision medicine research to dissect the molecular mechanisms of the human diseases.展开更多
The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)i...The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)in T cells in cancer immunity are lacking.Here,by systematically analyzing full-length single-cell RNA sequencing(scRNA-seq)data of more than 20,000 libraries of T cells across three cancer types,we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells.Specifically,we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes.This increased the number of current human lncRNA catalog by 16%and nearly doubled the number of lncRNAs expressed in T cells.We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies.Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells(metacells),154 signature lncRNA genes were identified.They were associated with effector,exhausted,and regulatory T cell states.Moreover,84 of them were functionally annotated based on the co-expression networks,indicating that lncRNAs might broadly participate in the regulation of T cell functions.Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.展开更多
Recent advances in next-generation sequencing technology allow high-throughput RNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies. For model organisms with a reference genome, the first step in ...Recent advances in next-generation sequencing technology allow high-throughput RNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies. For model organisms with a reference genome, the first step in analysis of RNA-Seq data involves mapping of short-read sequences to the reference genome. Reference-guided transcriptome assembly is an optional step, which is recommended if the aim is to identify novel transcripts. Following read mapping, the primary interest of biologists in many RNA-Seq studies is the investigation of differential expression between experimental groups. In this review, we discuss recent developments in RNA-Seq data analysis applied to model organisms, including methods and algorithms for direct mapping, reference-guided transcriptome assembly and differential expression analysis, and provide insights for the future direction of RNA-Seq.展开更多
While it is widely accepted that genetic diversity determines the potential of adaptation,the role that gene expression variation plays in adaptation remains poorly known.Here we show that gene expression diversity co...While it is widely accepted that genetic diversity determines the potential of adaptation,the role that gene expression variation plays in adaptation remains poorly known.Here we show that gene expression diversity could have played a positive role in the adaptation of Miscanthus lutarioriparius.RNA-seq was conducted for 80 individuals of the species,with half planted in the energy crop domestication site and the other half planted in the control site near native habitats.A leaf reference transcriptome consisting of 18,503 high-quality transcripts was obtained using a pipeline developed for de novo assembling with population RNA-seq data.The population structure and genetic diversity of M.lutarioriparius were estimated based on 30,609 genic single nucleotide polymorphisms.Population expression(Ep) and expression diversity(Ed)were defined to measure the average level and the magnitude of variation of a gene expression in the population,respectively.It was found that expression diversity increased while genetic Resediversity decreased after the species was transplanted from the native habitats to the harsh domestication site,especially for genes involved in abiotic stress resistance,histone methylation,and biomass synthesis under water limitation.The increased expression diversity could have enriched phenotypic variation directly subject to selections in the new environment.展开更多
基金supported by grants from the National Center for Research Resources (5P20RR016471-12)the National Institute of General Medical Sciences (8 P20 GM103442-12) from the National Institutes of Healththe seed collaborative research grant from the Odegard School of Aerospace Sciences and the School of Medicine and Health Sciences at University of North Dakota
文摘The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB945401, 2007CB108800)National Natural Science Foundation of China (Grant Nos. 30870575, 31071162,31000590)the Science and Technology Commission of Shanghai Municipality (Grant No. 11DZ2260300)
文摘De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
文摘A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript as- sembly contigs (TACs) with an N50 of 1510 bp, the largest one being -8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping posi- tions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
基金supported by Major scientific and technological projects of XPCC(2020KWZ-012)。
文摘Korla fragrant pear(KFP)with special fragrance is a unique cultivar in Xinjiang,China.In order to explore the biosynthesis molecular mechanism of chlorogenic acid(CGA)in KFP,the samples at different development periods were collected for transcriptome analysis.High performance liquid chromatography analysis showed that CGA contents of KFP at 88,118 and 163 days after full bloom were(20.96±1.84),(12.01±0.91)and(7.16±0.41)mg/100 g,respectively,and decreased with the fruit development.Pears from these typical 3 periods were selected for de novo transcriptome assemble and 68059 unigenes were assembled from 444037960 clean reads.One‘phenylpropanoid biosynthesis’pathway including 57 unigenes,11 PALs,1 PTAL,64CLs,9 C4Hs,25 HCTs and 5 C3’Hs related to CGA biosynthesis was determined.It was found that the expression levels of 11 differentially expressed genes including 1 PAL,2 C4Hs,34CLs and 5 HCTs were consistent with the change of CGA content.Quantitative polymerase chain reaction analysis further showed that 8 unigenes involved in CGA biosynthesis were consistent with the RNA-seq data.These findings will provide a comprehensive understanding and valuable information on the genetic engineering and molecular breeding in KFP.
基金This paper is based upon work supported by the National Science Foundation under Grand Nos. 1637312 and 1451316.
文摘Background: Precision medicine approach holds great promise to tailored diagnosis, treatment and prevention. Individuals can be vastly different in their genomic information and genetic mechanisms hence having unique transcriptomic signatures. The development of precision medicine has demanded moving beyond DNA sequencing (DNA-Seq) to much more pointed RNA-sequencing (R_NA-Seq) [Cell, 2017, 168: 584--599]. Results: Here we conduct a brief survey on the recent methodology development of transcriptome assembly approach using RNA-Seq. Conclusions: Since transcriptomes in human disease are highly complex, dynamic and diverse, transcriptome assembly is playing an increasingly important role in precision medicine research to dissect the molecular mechanisms of the human diseases.
基金This work was supported by the Science and Technology Project of Shenzhen,China(Grant Nos.JCYJ20190807145013281,JHZ20170310090257380,JCYJ20170413092711058,and JCYJ20170307095822325)the China Postdoctoral Science Foundation(Grant No.2019M663369)the National Natural Science Foundation of China(Grant No.31970636).
文摘The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)in T cells in cancer immunity are lacking.Here,by systematically analyzing full-length single-cell RNA sequencing(scRNA-seq)data of more than 20,000 libraries of T cells across three cancer types,we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells.Specifically,we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes.This increased the number of current human lncRNA catalog by 16%and nearly doubled the number of lncRNAs expressed in T cells.We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies.Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells(metacells),154 signature lncRNA genes were identified.They were associated with effector,exhausted,and regulatory T cell states.Moreover,84 of them were functionally annotated based on the co-expression networks,indicating that lncRNAs might broadly participate in the regulation of T cell functions.Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.
文摘Recent advances in next-generation sequencing technology allow high-throughput RNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies. For model organisms with a reference genome, the first step in analysis of RNA-Seq data involves mapping of short-read sequences to the reference genome. Reference-guided transcriptome assembly is an optional step, which is recommended if the aim is to identify novel transcripts. Following read mapping, the primary interest of biologists in many RNA-Seq studies is the investigation of differential expression between experimental groups. In this review, we discuss recent developments in RNA-Seq data analysis applied to model organisms, including methods and algorithms for direct mapping, reference-guided transcriptome assembly and differential expression analysis, and provide insights for the future direction of RNA-Seq.
基金supported by grants from the Key Program of the National Natural Science Foundation of China (No.91131902)the Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EX-QR-1)
文摘While it is widely accepted that genetic diversity determines the potential of adaptation,the role that gene expression variation plays in adaptation remains poorly known.Here we show that gene expression diversity could have played a positive role in the adaptation of Miscanthus lutarioriparius.RNA-seq was conducted for 80 individuals of the species,with half planted in the energy crop domestication site and the other half planted in the control site near native habitats.A leaf reference transcriptome consisting of 18,503 high-quality transcripts was obtained using a pipeline developed for de novo assembling with population RNA-seq data.The population structure and genetic diversity of M.lutarioriparius were estimated based on 30,609 genic single nucleotide polymorphisms.Population expression(Ep) and expression diversity(Ed)were defined to measure the average level and the magnitude of variation of a gene expression in the population,respectively.It was found that expression diversity increased while genetic Resediversity decreased after the species was transplanted from the native habitats to the harsh domestication site,especially for genes involved in abiotic stress resistance,histone methylation,and biomass synthesis under water limitation.The increased expression diversity could have enriched phenotypic variation directly subject to selections in the new environment.