期刊文献+

利用第三代纳米孔长读段测序技术构建和注释蜜蜂球囊菌的全长转录组 被引量:15

Construction and Annotation of Ascosphaera apis Full-Length Transcriptome Utilizing Nanopore Third-Generation Long-Read Sequencing Technology
在线阅读 下载PDF
导出
摘要 【目的】利用第三代纳米孔(nanopore)长读段测序技术对蜜蜂球囊菌(Ascosphaera apis,简称球囊菌)的纯化菌丝(Aam)和孢子(Aas)进行测序,构建和注释球囊菌的高质量全长转录组。【方法】通过Oxford Nanopore PromethION平台对Aam和Aas进行测序。利用Guppy软件对原始读段(raw reads)进行碱基识别(base calling),通过过滤短片段和低质量原始读段得到有效读段(clean reads)。通过识别两端引物鉴定全长转录本序列。通过比对Nr、Swissprot、KOG、eggNOG、Pfam、GO和KEGG数据库获得全长转录本的注释信息。分别利用CPC、CNCI、CPAT、Pfam 4种方法对长链非编码RNA(long non-coding RNA,lncRNA)进行预测,取四者的交集作为高可信度的lncRNA。【结果】Aam和Aas的纳米孔测序分别测得6321704和6259727条原始读段,经质控得到5669436和6233159条有效读段,其中包含的全长有效读段分别为4497102(79.32%)和4963101(79.62%)条。共鉴定到9859和16795条非冗余全长转录本,N50分别为1482和1658 bp,平均长度分别为1187和1303 bp,最大长度分别为6472和6815 bp。Venn分析结果显示有6512条非冗余全长转录本为菌丝和孢子所共有,分别有3347和10283个非冗余全长转录本为二者特有。此外,在球囊菌菌丝和孢子中共鉴定到20142条全长转录本,其中分别有20809、11151、17723、12164、11340和9833条全长转录本可注释到Nr、KOG、eggNOG、Pfam、GO和KEGG数据库。注释全长转录本数量最多的物种是球囊菌、Polytolypa hystricis和荚膜组织胞浆菌(Histoplasma capsulatum)。GO数据库注释结果显示,上述全长转录本可注释到45个功能条目,涉及细胞组件、细胞和细胞器等细胞组分相关条目;催化活性、结合和转运器活性等分子功能相关条目;以及细胞进程、代谢进程和单一组织进程等生物学进程相关条目。KEGG数据库注释结果显示,上述全长转录本还可注释到抗生素的生物合成、核糖体、氨基酸的生物合成、碳代谢和剪接体等49条通路。此外,鉴定到648条高可信度的lncRNA,包含480条基因间区lncRNA、119条反义链lncRNA和49条正义链lncRNA。【结论】构建和注释了球囊菌的首个高质量全长转录组,为探究球囊菌转录组的复杂性,完善参考基因组的序列和功能注释信息以及深入开展球囊菌可变剪接体的功能研究提供了关键依据。 【Objective】Purified mycelia sample(Aam) and spore sample(Aas) were sequenced using third-generation nanopore long-read sequencing technology, followed by construction and annotation of high-quality full-length transcriptome.【Method】Aam and Aas were respectively sequenced using Oxford Nanopore PromethION platform. Guppy software was used to conduct base calling of raw reads. Clean reads were obtained after filtering out short fragments and low-quality raw reads. Full-length transcripts were identified by recognizing primers at both ends of clean reads. Full-length transcripts were aligned to Nr, Swissprot, KOG, egg NOG, Pfam, GO and KEGG databases to gain corresponding annotations. Four approaches such as CPC, CNCI, CPAT, and Pfam were used to predict lncRNAs, and the intersection was deemed to be high-reliability lncRNAs.【Result】In total, 6 321 704 and 6 259 727 raw reads were yielded from nanopore sequencing of Aam and Aas, and after quality control, 5 669 436 and 6 233 159 clean reads were obtained, including 4 497 102(79.32%) and 4 963 101(79.62%) full-length clean reads. Additionally, 9 859 and 16 795 non-redundant full-length transcripts were identified, with a N50 of 1 482 and 1 658 bp, an average length of 1 187 and 1 303 bp, and a maximum length of 6 472 and 6 815 bp, respectively. Venn analysis showed that 6 512 non-redundant full-length transcripts were shared by Aam and Aas, while 3 347 and 10 283 ones were specific for Aam and Aas, respectively. Besides, a total of 20 142 full-length transcripts were identified in Aam and Aas, among them 20 809, 11 151, 17 723, 12 164, 11 340 and 9 833 full-length transcripts could be annotated to Nr, KOG, eggNOG, Pfam, GO and KEGG databases, respectively. Most of full-length transcripts were annotated to A. apis, Polytolypa hystricis and Histoplasma capsulatum. Moreover, GO database annotation demonstrated that the above-mentioned full-length transcripts could be annotated to 45 functional terms, involving in cell component-associated terms such as cell part, cell and organelle;molecular function-associated terms such as catalytic activity, binding and transporter activity;and biological process-associated terms such as cellular processes, metabolic processes and single-organism processes. KEGG database annotation indicated that these full-length transcripts could be annotated to 49 pathways, including biosynthesis of antibiotics, ribosome, biosynthesis of amino acid, carbon metabolism, spliceosome and so on. In addition, 648 lnc RNAs were identified, including 480 long intergenic RNAs(lincRNAs), 119 anti-sense lncRNAs and 49 sense lncRNAs. 【Conclusion】The first high-quality full-length transcriptome was constructed and annotated in this work, which offers a key basis for exploration of the complexity of A. apis transcriptome, improvement of sequence and functional annotation of reference genome and further study on isoforms’ function of A. apis.
作者 杜宇 祝智威 王杰 王秀娜 蒋海宾 范元婵 范小雪 陈华枝 隆琦 蔡宗兵 熊翠玲 郑燕珍 付中民 陈大福 郭睿 DU Yu;ZHU ZhiWei;WANG Jie;WANG XiuNa;JIANG HaiBin;FAN YuanChan;FAN XiaoXue;CHEN HuaZhi;LONG Qi;CAI ZongBing;XIONG CuiLing;ZHENG YanZhen;FU ZhongMin;CHEN DaFu;GUO Rui(College of Animal Sciences(College of Bee Science),Fujian Agriculture and Forestry University,Fuzhou 350002;Apitherapy Research Institution,Fujian Agriculture and Forestry University,Fuzhou 350002;College of Life Sciences,Fujian Agriculture and Forestry University,Fuzhou 350002;Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province(Fujian Agriculture and Forestry University),Fuzhou 350002)
出处 《中国农业科学》 CAS CSCD 北大核心 2021年第4期864-876,共13页 Scientia Agricultura Sinica
基金 国家现代农业产业技术体系建设专项(CARS-44-KXJ7) 福建省自然科学基金(2018J05042) 福建农林大学杰出青年科研人才计划(xjq201814) 福建农林大学优秀硕士学位论文资助基金(杜宇) 福建省病原真菌与真菌毒素重点实验室开放课题(郭睿) 江西省蜜蜂生物学与饲养重点实验室开放基金(JXKLHBB-2020-04)。
关键词 第三代高通量测序技术 纳米孔测序 全长转录本 参考转录组 蜜蜂 蜜蜂球囊菌 third-generation high-throughput sequencing technology nanopore sequencing full-length transcript reference transcriptome honeybee Ascosphaera apis
  • 相关文献

参考文献13

二级参考文献46

  • 1辛业芸,张展,熊易平,袁隆平.应用SSR分子标记鉴定超级杂交水稻组合及其纯度[J].中国水稻科学,2005,19(2):95-100. 被引量:28
  • 2陈淑静,康雪冬,冯峰.蜜蜂白垩病研究初报[J].中国养蜂,1994,45(6):4-7. 被引量:12
  • 3Apweiler R,Attwood T K,Bairoch A,et al.The InterPro Database,An Integrated Documentation Resource for Protein Families,Domains and Functional Sites[J].Nucleic Acids Res.,2001,29(1):37-40.
  • 4Frishman D,Mokrejs M,Kosykh D,et al.The PEDANT Genome Database[J].Nucleic Acids Res.,2003,31(1):207.
  • 5Prlic A,Domingues F S,Lackner P,et al.WILMA-automated Annotation of Protein Sequences[J].Bioinformatics,2004,20(1):127.
  • 6Kersey P J,Duarte J,Williams A,et al.The International Protein Index:an Integrated Database for Proteomics Experiments[J].Proteomics,2004,4(7):1985.
  • 7Apweiler R,Bairoch A,Wu C H,et al.UniProt:the Universal Protein Knowledgebase[EB/OL].Nucleic Acids Res.http:// nar.Oxfordjo urnals.org/cgi/content/full/32/supp1_1/d115,2004,32(Database Issue):115.
  • 8Altschul S F,Madden T L,Schaffer A A,et al.Gapped BLAST and PSI-BLAST:A New Generation of Protein Database Search Programs[J].Nucleic Acids Res.,1997,25(17):3389.
  • 9Farina WM, Wainselboim AJ, 2005. Trophallaxis within the dancing context: a behavioral and thermographic analysis in honeybees (Apis mellifera). Apidologie, 36 : 43 - 47.
  • 10Gramkow AW, Perecmanis S, Sousa RL, Noronha EF, Felix CR, Nagata T, Ribeiro BM, 2010. Insecticidal activity of two proteases against Spodoptera frugiperda larvae infected with recombinant baculoviruses. Virology Journal, 7 : 143 - 152.

共引文献260

同被引文献81

引证文献15

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部