The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable,universal,and automated preliminary taxonomic framework for comprehensive Virus studies.Here,we introduce Virus S...The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable,universal,and automated preliminary taxonomic framework for comprehensive Virus studies.Here,we introduce Virus Sequence-based Taxonomy Assignment(VISTA),a computa-tional tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for vi-rus taxonomy.Leveraging physio-chemical property sequences,k-mer profiles,and machine learning techniques,ViSTA constructs a robust distance-based framework for taxonomic assignment.Functionally similar to Pairwise Sequence Comparison(PAsC),a widely used virus as-signment tool based on pairwise sequence comparison,VisTA demonstrates superior performance by providing significantly improved separa-tion for taxonomic groups,more objective taxonomic demarcation thresholds,greatly enhanced speed,and a wider application scope.We suc-cessfully applied ViSTA to 38 virus families,as well as to the class Caudoviricetes.This demonstrates VISTA's scalability,robustness,and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses.Furthermore,the application of ViSTA to 679 unclas-sified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families.VisTA is available as both a command line toolandauser-friendlywebportal athttps:/ngdc.cncb.ac.cn/vista.展开更多
On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS...On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2).2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates,which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline.Of particular note,2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale.It provides all identified variants and their detailed statistics for each virus isolate,and congregates the quality score,functional annotation,and population frequency for each variant.Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available.Moreover,2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019(COVID-19),including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC.Furthermore,by linking with relevant databases in CNCB,2019nCoVR offers data submission services for raw sequence reads and assembled genomes,and data sharing with NCBI.Collectively,SARS-CoV-2 is updated daily to collect the latest information on genome sequences,variants,haplotypes,and literature for a timely reflection,making 2019nCoVR a valuable resource for the global research community.2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.展开更多
基金funded by the National Natural Science Foundation of China(Grant No.32270019)the Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07)the Open Biodiversity and Health Big Data Programme of IUBS.
文摘The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable,universal,and automated preliminary taxonomic framework for comprehensive Virus studies.Here,we introduce Virus Sequence-based Taxonomy Assignment(VISTA),a computa-tional tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for vi-rus taxonomy.Leveraging physio-chemical property sequences,k-mer profiles,and machine learning techniques,ViSTA constructs a robust distance-based framework for taxonomic assignment.Functionally similar to Pairwise Sequence Comparison(PAsC),a widely used virus as-signment tool based on pairwise sequence comparison,VisTA demonstrates superior performance by providing significantly improved separa-tion for taxonomic groups,more objective taxonomic demarcation thresholds,greatly enhanced speed,and a wider application scope.We suc-cessfully applied ViSTA to 38 virus families,as well as to the class Caudoviricetes.This demonstrates VISTA's scalability,robustness,and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses.Furthermore,the application of ViSTA to 679 unclas-sified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families.VisTA is available as both a command line toolandauser-friendlywebportal athttps:/ngdc.cncb.ac.cn/vista.
基金This work was supported by grants from the Strategic PriorityResearch Program of Chinese Academy of Sciences(GrantNos.XDA19090116,XDA19050302,and XDB38030400)awarded to SS,ZZ,and MLthe National Key R&D Programof China(Grant Nos.2020YFC0848900,2020YFC0847000,2016YFE0206600,and 2017YFC0907502)+5 种基金the 13th Five-yearInformatization Plan of Chinese Academy of Sciences(GrantNo.XXH13505-05)Genomics Data Center Construction ofChinese Academy of Sciences(Grant No.XXH-13514-0202)the Open Biodiversity and Health Big Data Programme ofInternational Union of Biological Sciences,International Part-nership Program of Chinese Academy of Sciences(Grant No.153F11KYSB20160008)the Professional Association of theAlliance of International Science Organizations(Grant No.ANSO-PA-2020-07)This work was also supported by KCWong Education Foundation to ZZthe YouthInnovation Promotion Association of Chinese Academy ofSciences(Grant Nos.2017141 and 2019104)awarded to SSand ML.
文摘On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2).2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates,which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline.Of particular note,2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale.It provides all identified variants and their detailed statistics for each virus isolate,and congregates the quality score,functional annotation,and population frequency for each variant.Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available.Moreover,2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019(COVID-19),including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC.Furthermore,by linking with relevant databases in CNCB,2019nCoVR offers data submission services for raw sequence reads and assembled genomes,and data sharing with NCBI.Collectively,SARS-CoV-2 is updated daily to collect the latest information on genome sequences,variants,haplotypes,and literature for a timely reflection,making 2019nCoVR a valuable resource for the global research community.2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.