With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for ma...With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.展开更多
The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb...The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb.ac.cn/genbase),an open-access data repository that follows the International Nucleotide Sequence Database Collaboration(INSDC)data standards and structures,for efficient nucleotide sequence archiving,searching,and sharing.As a core resource within the National Genomics Data Center(NGDC)of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GenBase offers bilingual submission pipeline and services,as well as local submission assistance in China.GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences,along with a real-time data validation system to streamline sequence submissions.As of April 23,2024,GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions.Out of these,63,614(93%)nucleotide sequences and 620,640(90%)annotated protein sequences have been released and are publicly accessible through GenBase’s web search system,File Transfer Protocol(FTP),and Application Programming Interface(API).Additionally,in collaboration with INSDC,GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences.Furthermore,GenBase integrates all sequences from GenBank with daily updates,demonstrating its commitment to actively contributing to global sequence data management and sharing.展开更多
基金supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant Nos.XDB13040500 and XDA08020102)the National High-tech R&D Program(863 Program+5 种基金Grant Nos.2014AA021503 and 2015AA020108)the National Key Research Program of China(Grant Nos.2016YFC0901603,2016YFB0201702,2016YFC0901903,and 2016YFC0901701)the International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008)the Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14)the Key Technology Talent Program of the Chinese Academy of Sciences(awarded to WZ)the 100 Talent Program of the Chinese Academy of Sciences(awarded to ZZ)
文摘With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB38030200)the National Key R&D Program of China(Grant No.2021YFF0703701)+2 种基金the Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2023-07)the International Partnership Program of the Chinese Academy of Sciences(Grant No.161GJHZ2022002MI)the Open Biodiversity and Health Big Data Initiative of International Union of Biological Sciences(IUBS).
文摘The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb.ac.cn/genbase),an open-access data repository that follows the International Nucleotide Sequence Database Collaboration(INSDC)data standards and structures,for efficient nucleotide sequence archiving,searching,and sharing.As a core resource within the National Genomics Data Center(NGDC)of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GenBase offers bilingual submission pipeline and services,as well as local submission assistance in China.GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences,along with a real-time data validation system to streamline sequence submissions.As of April 23,2024,GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions.Out of these,63,614(93%)nucleotide sequences and 620,640(90%)annotated protein sequences have been released and are publicly accessible through GenBase’s web search system,File Transfer Protocol(FTP),and Application Programming Interface(API).Additionally,in collaboration with INSDC,GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences.Furthermore,GenBase integrates all sequences from GenBank with daily updates,demonstrating its commitment to actively contributing to global sequence data management and sharing.