通过文本分析法和网络调研法,发掘美国科学公共图书馆(Public Library of Science,PLOS)开放科学服务的路径与运作模式,讨论PLOS开放科学服务的特点和发展模式。建议我国依据发展趋势、制定符合国情和科技发展水平的开放科学政策、实现...通过文本分析法和网络调研法,发掘美国科学公共图书馆(Public Library of Science,PLOS)开放科学服务的路径与运作模式,讨论PLOS开放科学服务的特点和发展模式。建议我国依据发展趋势、制定符合国情和科技发展水平的开放科学政策、实现路径、机制、工作模式以及学术出版模式的创新,推动新兴技术、工具在开放科学的应用,提高出版平台对发布成果的及时评价反馈和宣传推广,为我国全面融入开放科学全过程、全方位提升服务能力提供保障。展开更多
科研人员在钻研科研问题与分享科研数据的过程中,需要某种基础设施来确保数据最大程度的获取性、稳定性和可用性。这类基础设施可以统称为科研数据知识库(Research Data Repository,RDR)。自2012年启动的re3data.org项目,主要从事科研...科研人员在钻研科研问题与分享科研数据的过程中,需要某种基础设施来确保数据最大程度的获取性、稳定性和可用性。这类基础设施可以统称为科研数据知识库(Research Data Repository,RDR)。自2012年启动的re3data.org项目,主要从事科研数据知识库的登记注册,以及为科研人员、科研资助组织、图书馆和出版商等提供有关异构科研数据知识库的全景概述。截至2013年7月,已有400个科研数据知识库向re3data.org登记,其中288个采用re3data.org的信息图标,以协助科研人员遴选合适的知识库,并且存储与重用他们的数据。这篇论文描绘异构RDR的全景,表述机构的、学科的、跨学科的以及项目专业的RDR类型。深入描述re3data.org的特性,以及这套注册系统如何协助科研人员分辨适合存储和搜索科研数据的知识库。展开更多
Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library arc...Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.展开更多
文摘通过文本分析法和网络调研法,发掘美国科学公共图书馆(Public Library of Science,PLOS)开放科学服务的路径与运作模式,讨论PLOS开放科学服务的特点和发展模式。建议我国依据发展趋势、制定符合国情和科技发展水平的开放科学政策、实现路径、机制、工作模式以及学术出版模式的创新,推动新兴技术、工具在开放科学的应用,提高出版平台对发布成果的及时评价反馈和宣传推广,为我国全面融入开放科学全过程、全方位提升服务能力提供保障。
文摘科研人员在钻研科研问题与分享科研数据的过程中,需要某种基础设施来确保数据最大程度的获取性、稳定性和可用性。这类基础设施可以统称为科研数据知识库(Research Data Repository,RDR)。自2012年启动的re3data.org项目,主要从事科研数据知识库的登记注册,以及为科研人员、科研资助组织、图书馆和出版商等提供有关异构科研数据知识库的全景概述。截至2013年7月,已有400个科研数据知识库向re3data.org登记,其中288个采用re3data.org的信息图标,以协助科研人员遴选合适的知识库,并且存储与重用他们的数据。这篇论文描绘异构RDR的全景,表述机构的、学科的、跨学科的以及项目专业的RDR类型。深入描述re3data.org的特性,以及这套注册系统如何协助科研人员分辨适合存储和搜索科研数据的知识库。
基金supported by the National Population and Health Scientific Data Sharing Program of Chinathe Knowledge Centre for Engineering Sciences and Technology (Medical Centre)the Fundamental Research Funds for the Central Universities (Grant No.: 13R0101)
文摘Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.