This paper investigates the simultaneous wireless information and powertransfer(SWIPT) for network-coded two-way relay network from an information-theoretic perspective, where two sources exchange information via an S...This paper investigates the simultaneous wireless information and powertransfer(SWIPT) for network-coded two-way relay network from an information-theoretic perspective, where two sources exchange information via an SWIPT-aware energy harvesting(EH) relay. We present a power splitting(PS)-based two-way relaying(PS-TWR) protocol by employing the PS receiver architecture. To explore the system sum rate limit with data rate fairness, an optimization problem under total power constraint is formulated. Then, some explicit solutions are derived for the problem. Numerical results show that due to the path loss effect on energy transfer, with the same total available power, PS-TWR losses some system performance compared with traditional non-EH two-way relaying, where at relatively low and relatively high signalto-noise ratio(SNR), the performance loss is relatively small. Another observation is that, in relatively high SNR regime, PS-TWR outperforms time switching-based two-way relaying(TS-TWR) while in relatively low SNR regime TS-TWR outperforms PS-TWR. It is also shown that with individual available power at the two sources, PS-TWR outperforms TS-TWR in both relatively low and high SNR regimes.展开更多
This study presents enhancing images authentication by securing watermarking hidden data via shares generated from counting-based secret sharing.The trustfulness of shares utilised secret-sharing as an applicable priv...This study presents enhancing images authentication by securing watermarking hidden data via shares generated from counting-based secret sharing.The trustfulness of shares utilised secret-sharing as an applicable privacy creation tool for the authentication of real-life complex platforms.This research adjusts embedding the watermarking data over the images by innovative redistribution of shares to be embedded spread over all the images.The anticipated watermarking technique guaranteed to scatter the share bits implanting at different least significant bits of image pixels as boosting up the trust overall authentication practicality.The paper experimentation performance analysis shows that this improved image watermarking authentication(capacity)is averagely better by 33%–67%than other related exclusive-OR oriented and octagon approaches.Interestingly,these measurement improvements did not degrade the robustness and security of the system,inspiring our research for opening novel track of related future counting-based secret-sharing authentication progresses to come.展开更多
Researchers often face difficulties in organizing and structuring their research data.Moreover,today,these data must respect the FAIR principles.To overcome these problems,in the environmental domain,the Meta-Obs proj...Researchers often face difficulties in organizing and structuring their research data.Moreover,today,these data must respect the FAIR principles.To overcome these problems,in the environmental domain,the Meta-Obs project proposes a metamodel,a methodology and a tool that generates a”FAIRizable”database.The obtained database is adapted to the needs of the researcher and is enriched with metadata.In this article we present Meta-Obs and we illustrate its use with a concrete case:the design of the database of the Aspergillus fumigatus collection.展开更多
In the dynamic environment of hospitals, valuable real-world data often remain underutilised despite their potential to revolutionize cancer research and personalised medicine. This study explores the challenges and o...In the dynamic environment of hospitals, valuable real-world data often remain underutilised despite their potential to revolutionize cancer research and personalised medicine. This study explores the challenges and opportunities in managing hospital-generated data, particularly within the Masaryk Memorial Cancer Institute (MMCI) in Brno, Czech Republic. Utilizing Next-Generation Sequencing (NGS) technology, MMCI generates substantial volumes of genomic data. Due to inadequate curation, these data remain difficult to integrate with clinical records for secondary use (such as personalised treatment outcome prediction and patient stratification based on their genomic profiles). This paper proposes solutions based on the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) to enhance data sharing and reuse. The primary output of our work is the development of an automated pipeline that continuously processes and integrates NGS data with clinical and biobank information upon their creation. It stores the data in a special secured repository for sensitive data in a structured form to ensure smooth retrieval.展开更多
The FAIR guiding principles aim to enhance the Findability,Accessibility,Interoperability and Reusability of digital resources such as data,for both humans and machines.The process of making data FAIR(“FAIRification...The FAIR guiding principles aim to enhance the Findability,Accessibility,Interoperability and Reusability of digital resources such as data,for both humans and machines.The process of making data FAIR(“FAIRification”)can be described in multiple steps.In this paper,we describe a generic step-by-step FAIRification workflow to be performed in a multidisciplinary team guided by FAIR data stewards.The FAIRification workflow should be applicable to any type of data and has been developed and used for“Bring Your Own Data”(BYOD)workshops,as well as for the FAIRification of e.g.,rare diseases resources.The steps are:1)identify the FAIRification objective,2)analyze data,3)analyze metadata,4)define semantic model for data(4a)and metadata(4b),5)make data(5a)and metadata(5b)linkable,6)host FAIR data,and 7)assess FAIR data.For each step we describe how the data are processed,what expertise is required,which procedures and tools can be used,and which FAIR principles they relate to.展开更多
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and...DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and domains,particularly in conjunction with implementing the FAIR data principles(forfindable,accessible,interoperable and reusable data).These implementation experiences,besides demonstrating the fitness of DCAT to meet its intended purpose,helped identify existing issues and gaps.Moreover,over the last few years,additional requirements emerged in data catalogs,given the increasing practice of documenting not only datasets but also data services and APls.This paper illustrates the new version of DCAT,explaining the rationale behind its main revisions and extensions,based on the collected use cases and requirements,and outlines the issues yet to be addressed in future versions of DCAT.展开更多
One of the key goals of the FAIR guiding principles is defined by its final principle-to optimize data sets for reuse by both humans and machines.To do so,data providers need to implement and support consistent machin...One of the key goals of the FAIR guiding principles is defined by its final principle-to optimize data sets for reuse by both humans and machines.To do so,data providers need to implement and support consistent machine readable metadata to describe their data sets.This can seem like a daunting task for data providers,whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used.Additionally,for existing data sets it is often unclear what steps should be taken to enable maximal,appropriate reuse.Data citation already plays an important role in making data findable and accessible,providing persistent and unique identifiers plus metadata on over 16 million data sets.In this paper,we discuss how data citation and its underlying infrastructures,in particular associated metadata,provide an important pathway for enabling FAIR data reuse.展开更多
Data repository infrastructures for academics have appeared in waves since the dawn of Web technology.These waves are driven by changes in societal needs,archiving needs and the development of cloud computing resource...Data repository infrastructures for academics have appeared in waves since the dawn of Web technology.These waves are driven by changes in societal needs,archiving needs and the development of cloud computing resources.As such,the data repository landscape has many flavors when it comes to sustainability models,target audiences and feature sets.One thing that links all data repositories is a desire to make the content they host reusable,building on the core principles of cataloging content for economical and research speed efficiency.The FAIR principles are a common goal for all repository infrastructures to aim for.No matter what discipline or infrastructure,the goal of reusable content,for both humans and machines,is a common one.This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability.The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one.This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.展开更多
The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have bee...The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles.Moreover,the FAIRmetrics group released 14,general-purpose maturity for representing FAIRness.Initially,these metrics were conducted as open-answer questionnaires.Recently,these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation.With so many different approaches for FAIRness evaluations,we believe that further clarification on their limitations and advantages,as well as on their interpretation and interplay should be considered.展开更多
Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to de...Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to define a common approach to publish semantically-rich and machine-actionable metadata according to the FAIR principles.We present the core components and features of the FDP,its approach to metadata provision,the criteria to evaluate whether an application adheres to the FDP specifications and the service to register,index and allow users to search for metadata content of available FDPs.展开更多
Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastru...Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.展开更多
Thousands of community-developed(meta)data guidelines,models,ontologies,schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases,across all disciplines.These reso...Thousands of community-developed(meta)data guidelines,models,ontologies,schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases,across all disciplines.These resources are necessary to meet government,funder and publisher expectations of greater transparency and access to and preservation of data related to research publications.This obligates researchers to ensure their data is FAIR,share their data using the appropriate standards,store their data in sustainable and community-adopted repositories,and to conform to funder and publisher data policies.FAIR data sharing also plays a key role in enabling researchers to evaluate,re-analyse and reproduce each other’s work.We can map the landscape of relationships between community-adopted standards and repositories,and the journal publisher and funder data policies that recommend their use.In this paper,we show how the work of the GO-FAIR FAIR Standards,Repositories and Policies(StRePo)Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards,repositories and data policies in general.Pivotal to this effort,the FAIRsharing,an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories,and the journal publisher and funder data policies that recommend their use.Lastly,we highlight a number of activities around FAIR tools,services and educational efforts to raise awareness and encourage participation.展开更多
This article assesses the difference between the concepts of ‘open data’ and ‘FAIR data’ in data management. FAIR data is understood as data that complies with the FAIR Guidelines—data that is Findable, Accessibl...This article assesses the difference between the concepts of ‘open data’ and ‘FAIR data’ in data management. FAIR data is understood as data that complies with the FAIR Guidelines—data that is Findable, Accessible, Interoperable and Reusable—while open data was born out of awareness of the need to democratise data by improving its accessibility, based on the idea that data should not have limitations that prevent people from using it. This study compared FAIR data with open data by analysing relevant documents using a coding analysis with conceptual labels based on Kingdon’s theory of agenda setting. The study found that in relation to FAIR data the problem stream focuses on the complexity of data collected for research, while open data primarily emphasises giving the public access to non-confidential data. In the policy stream, the two concepts share common standpoints in terms of making data available and reusable, although different approaches are adopted in practice to accomplish these goals. In the politics stream, stakeholders with different objectives support FAIR data and from those who support open data.展开更多
The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability,Accessibility,Interoperability,and Reuse of digital assets in the face of an exponential increase of data vol...The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability,Accessibility,Interoperability,and Reuse of digital assets in the face of an exponential increase of data volume and complexity.The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data.Here,we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies.We discuss the data curation effort involved,the resulting output,and the business and scientific impact of our work.Finally,we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.展开更多
The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperabl...The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel frame-work designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our frame-work are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.展开更多
The last letter of the FAIR acronym stands for Reusability.Data and metadata should be made available with a clear and accessible usage license.But,what are the choices?How can researchers share data and allow reusabi...The last letter of the FAIR acronym stands for Reusability.Data and metadata should be made available with a clear and accessible usage license.But,what are the choices?How can researchers share data and allow reusability?Are all the licenses available for sharing content suitable for data?Data can be covered by different layers of copyright protection making the relationship between data and copyright particularly complex.Some research data can be considered as a work and therefore covered by full copyright while other data can be in the public domain due to their lack of originality.Moreover,a collection of data can be protected by special rights in Europe to acknowledge the investment in time and money in obtaining,presenting,arranging or verifying the data.The need of using a license when sharing data comes from the fact that,under current copyright laws,when rights exist,the absence of any legal notice must be understood as the default“all rights reserved”regime.Unless an exception applies,the authorisation of right holders is necessary for reuse.Right holders could use any text to state the reusability of data but it is advisable to use some of the existing licenses,and especially the ones that are suitable for data and databases.We hope that with this paper we can bring some clarity in relation to the rights involved when sharing research data.展开更多
Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by researc...Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019.We categorized the data availability statements,and looked at trends over time.We found expected increases in the number of data availability statements submitted over time,and marked increases that correlate with policy changes made by journals.Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.展开更多
The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges...The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.展开更多
While the FAIR Principles do not specify a technical solution for'FAIRness',it was clear from the outset of the FAIR initiative that it would be useful to have commodity software and tooling that would simplif...While the FAIR Principles do not specify a technical solution for'FAIRness',it was clear from the outset of the FAIR initiative that it would be useful to have commodity software and tooling that would simplify the creation of FAIR-compliant resources.The FAIR Data Point is a metadata repository that follows the DCAT(2)schema,and utilizes the Linked Data Platform to manage the hierarchical metadata layers as LDP Containers.There has been a recent flurry of development activity around the FAIR Data Point that has significantly improved its power and ease-of-use.Here we describe five specific tools—an installer,a loader,two Webbased interfaces,and an indexer-aimed at maximizing the uptake and utility of the FAIR Data Point.展开更多
This article describes the FAIRification process(which involves making data Findable, Accessible, Interoperable and Reusable—or FAIR—for both machines and humans) for data related to the impact of COVID-19 on migran...This article describes the FAIRification process(which involves making data Findable, Accessible, Interoperable and Reusable—or FAIR—for both machines and humans) for data related to the impact of COVID-19 on migrants, refugees and asylum seekers in Tunisia, Libya and Niger, according to the scheme adopted by GO FAIR. This process was divided into three phases: pre-FAIRification, FAIRification and postFAIRification. Each phase consisted of seven steps. In the first phase, 118 in-depth interviews and 565 press articles and research reports were collected by students and researchers at the University of Sousse in Tunisia and researchers in Niger. These interviews, articles and reports constitute the dataset for this research. In the second phase, the data were sorted and converted into a machine actionable format and published on a FAIR Data Point hosted at the University of Sousse. In the third phase, an assessment of the implementation of the FAIR Guidelines was undertaken. Certain barriers and challenges were faced in this process and solutions were found. For FAIR data curation, certain changes need to be made to the technical process. People need to be convinced to make these changes and that the implementation of FAIR will generate a long-term return on investment. Although the implementation of FAIR Guidelines is not straightforward, making our resources FAIR is essential to achieving better science together.展开更多
基金supported by the National Natural Science Foundation of China ( No . 61602034 )the Beijing Natural Science Foundation (No. 4162049)+2 种基金the Open Research Fund of National Mobile Communications Research Laboratory,Southeast University (No. 2014D03)the Fundamental Research Funds for the Central Universities Beijing Jiaotong University (No. 2016JBM015)the NationalHigh Technology Research and Development Program of China (863 Program) (No. 2015AA015702)
文摘This paper investigates the simultaneous wireless information and powertransfer(SWIPT) for network-coded two-way relay network from an information-theoretic perspective, where two sources exchange information via an SWIPT-aware energy harvesting(EH) relay. We present a power splitting(PS)-based two-way relaying(PS-TWR) protocol by employing the PS receiver architecture. To explore the system sum rate limit with data rate fairness, an optimization problem under total power constraint is formulated. Then, some explicit solutions are derived for the problem. Numerical results show that due to the path loss effect on energy transfer, with the same total available power, PS-TWR losses some system performance compared with traditional non-EH two-way relaying, where at relatively low and relatively high signalto-noise ratio(SNR), the performance loss is relatively small. Another observation is that, in relatively high SNR regime, PS-TWR outperforms time switching-based two-way relaying(TS-TWR) while in relatively low SNR regime TS-TWR outperforms PS-TWR. It is also shown that with individual available power at the two sources, PS-TWR outperforms TS-TWR in both relatively low and high SNR regimes.
文摘This study presents enhancing images authentication by securing watermarking hidden data via shares generated from counting-based secret sharing.The trustfulness of shares utilised secret-sharing as an applicable privacy creation tool for the authentication of real-life complex platforms.This research adjusts embedding the watermarking data over the images by innovative redistribution of shares to be embedded spread over all the images.The anticipated watermarking technique guaranteed to scatter the share bits implanting at different least significant bits of image pixels as boosting up the trust overall authentication practicality.The paper experimentation performance analysis shows that this improved image watermarking authentication(capacity)is averagely better by 33%–67%than other related exclusive-OR oriented and octagon approaches.Interestingly,these measurement improvements did not degrade the robustness and security of the system,inspiring our research for opening novel track of related future counting-based secret-sharing authentication progresses to come.
基金funded by the laboratoire Chrono-environnement and the OSU THETA.
文摘Researchers often face difficulties in organizing and structuring their research data.Moreover,today,these data must respect the FAIR principles.To overcome these problems,in the environmental domain,the Meta-Obs project proposes a metamodel,a methodology and a tool that generates a”FAIRizable”database.The obtained database is adapted to the needs of the researcher and is enriched with metadata.In this article we present Meta-Obs and we illustrate its use with a concrete case:the design of the database of the Aspergillus fumigatus collection.
基金funding from the project SALVAGE(P JACreg.no.CZ.02.01.01/00/22_008/0004644)-funded by the European Union and by the State Budget of the Czech Republic,from MH CZ-DRO(MMCI,00209805)+1 种基金BBMRI.cz(no.LM2023033)Computational resources were provided by the e-INFRA CZ project(no.LM2023054).
文摘In the dynamic environment of hospitals, valuable real-world data often remain underutilised despite their potential to revolutionize cancer research and personalised medicine. This study explores the challenges and opportunities in managing hospital-generated data, particularly within the Masaryk Memorial Cancer Institute (MMCI) in Brno, Czech Republic. Utilizing Next-Generation Sequencing (NGS) technology, MMCI generates substantial volumes of genomic data. Due to inadequate curation, these data remain difficult to integrate with clinical records for secondary use (such as personalised treatment outcome prediction and patient stratification based on their genomic profiles). This paper proposes solutions based on the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) to enhance data sharing and reuse. The primary output of our work is the development of an automated pipeline that continuously processes and integrates NGS data with clinical and biobank information upon their creation. It stores the data in a special secured repository for sensitive data in a structured form to ensure smooth retrieval.
基金The work of A.Jacobsen,R.Kaliyaperumal,M.Roos and M.Thompson is supported by funding from the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP N°825575The work of A.Jacobsen,R.Kaliyaperumal,M.Roos and M.Thompson is supported by funding from ELIXIR EXCELERATE,H2020 grant agreement number 676559.M.Roos and M.Thompson received funding from NWO(VWData 400.17.605)H2020-EU 824087.The work of B.Mons and L.O.Bonino da Silva Santos is funded by the H2020-EU 824068 and the GO FAIR ISCO grant of the Dutch Ministry of Science and Culture.
文摘The FAIR guiding principles aim to enhance the Findability,Accessibility,Interoperability and Reusability of digital resources such as data,for both humans and machines.The process of making data FAIR(“FAIRification”)can be described in multiple steps.In this paper,we describe a generic step-by-step FAIRification workflow to be performed in a multidisciplinary team guided by FAIR data stewards.The FAIRification workflow should be applicable to any type of data and has been developed and used for“Bring Your Own Data”(BYOD)workshops,as well as for the FAIRification of e.g.,rare diseases resources.The steps are:1)identify the FAIRification objective,2)analyze data,3)analyze metadata,4)define semantic model for data(4a)and metadata(4b),5)make data(5a)and metadata(5b)linkable,6)host FAIR data,and 7)assess FAIR data.For each step we describe how the data are processed,what expertise is required,which procedures and tools can be used,and which FAIR principles they relate to.
基金partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215funded by refinitiv.com (previously Thomson Reuters)
文摘DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and domains,particularly in conjunction with implementing the FAIR data principles(forfindable,accessible,interoperable and reusable data).These implementation experiences,besides demonstrating the fitness of DCAT to meet its intended purpose,helped identify existing issues and gaps.Moreover,over the last few years,additional requirements emerged in data catalogs,given the increasing practice of documenting not only datasets but also data services and APls.This paper illustrates the new version of DCAT,explaining the rationale behind its main revisions and extensions,based on the collected use cases and requirements,and outlines the issues yet to be addressed in future versions of DCAT.
基金This work was partially supported by Horizon 2020,INFRADEV-4-2014-2015,654248,CORBEL,Coordinated Research Infrastructures Building Enduring Life-science services.
文摘One of the key goals of the FAIR guiding principles is defined by its final principle-to optimize data sets for reuse by both humans and machines.To do so,data providers need to implement and support consistent machine readable metadata to describe their data sets.This can seem like a daunting task for data providers,whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used.Additionally,for existing data sets it is often unclear what steps should be taken to enable maximal,appropriate reuse.Data citation already plays an important role in making data findable and accessible,providing persistent and unique identifiers plus metadata on over 16 million data sets.In this paper,we discuss how data citation and its underlying infrastructures,in particular associated metadata,provide an important pathway for enabling FAIR data reuse.
文摘Data repository infrastructures for academics have appeared in waves since the dawn of Web technology.These waves are driven by changes in societal needs,archiving needs and the development of cloud computing resources.As such,the data repository landscape has many flavors when it comes to sustainability models,target audiences and feature sets.One thing that links all data repositories is a desire to make the content they host reusable,building on the core principles of cataloging content for economical and research speed efficiency.The FAIR principles are a common goal for all repository infrastructures to aim for.No matter what discipline or infrastructure,the goal of reusable content,for both humans and machines,is a common one.This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability.The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one.This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.
基金M.Dumontier was supported by grants from NWO(400.17.605628.011.011)+5 种基金NIH(3OT3TR002027-01S11OT3OD025467-011OT3OD025464-01)H2020-EU EOSClife(824087)ELIXIR,the research infrastructure for life-science data.R.de Miranda Azevedo was supported by grants from H2020-EU EOSClife(824087)ELIXIR,the research infrastructure for life-science data.
文摘The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles.Moreover,the FAIRmetrics group released 14,general-purpose maturity for representing FAIRness.Initially,these metrics were conducted as open-answer questionnaires.Recently,these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation.With so many different approaches for FAIRness evaluations,we believe that further clarification on their limitations and advantages,as well as on their interpretation and interplay should be considered.
文摘Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to define a common approach to publish semantically-rich and machine-actionable metadata according to the FAIR principles.We present the core components and features of the FDP,its approach to metadata provision,the criteria to evaluate whether an application adheres to the FDP specifications and the service to register,index and allow users to search for metadata content of available FDPs.
文摘Research data currently face a huge increase of data objects with an increasing variety of types(data types,formats)and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures.Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders,including research administrators and funders.This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable,accessible,interoperable and reusable,but also doing so in a way that leverages machine support for better efficiency.One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management.In this article,we describe how machine capabilities can be extended to make workflows more findable,in particular by leveraging the Digital Object Architecture,common object operations and machine learning techniques.
基金Some of the discussion points in this article and the call for action were developed as part of the joint RDA and Force11 working group and the GO-FAIR StRePo INWe therefore gratefully acknowledge the support provided by the RDA,Force11 and GO-FAIR communities and structures.FAIRsharing is funded by grants awarded to S.-A.S.that include elements of this work+3 种基金specifically,grants from the UK BBSRC and Research Councils(BB/L024101/1,BB/L005069/1)European Union(H2020-EU.3.1,634107,H2020-EU.1.4.1.3,654241,H2020-EU.1.4.1.1,676559),IMI(116060)and NIH(U54 AI117925,1U24AI117966-01,1OT3OD025459-01,1OT3OD025467-01,1OT3OD025462-01)the new FAIRsharing award from the Wellcome Trust(212930/Z/18/Z)as well as a related award(208381/A/17/Z).S.-A.S.is funded also by the Oxford e-Research Centre,Department of Engineering Science of the University of Oxford.
文摘Thousands of community-developed(meta)data guidelines,models,ontologies,schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases,across all disciplines.These resources are necessary to meet government,funder and publisher expectations of greater transparency and access to and preservation of data related to research publications.This obligates researchers to ensure their data is FAIR,share their data using the appropriate standards,store their data in sustainable and community-adopted repositories,and to conform to funder and publisher data policies.FAIR data sharing also plays a key role in enabling researchers to evaluate,re-analyse and reproduce each other’s work.We can map the landscape of relationships between community-adopted standards and repositories,and the journal publisher and funder data policies that recommend their use.In this paper,we show how the work of the GO-FAIR FAIR Standards,Repositories and Policies(StRePo)Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards,repositories and data policies in general.Pivotal to this effort,the FAIRsharing,an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories,and the journal publisher and funder data policies that recommend their use.Lastly,we highlight a number of activities around FAIR tools,services and educational efforts to raise awareness and encourage participation.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘This article assesses the difference between the concepts of ‘open data’ and ‘FAIR data’ in data management. FAIR data is understood as data that complies with the FAIR Guidelines—data that is Findable, Accessible, Interoperable and Reusable—while open data was born out of awareness of the need to democratise data by improving its accessibility, based on the idea that data should not have limitations that prevent people from using it. This study compared FAIR data with open data by analysing relevant documents using a coding analysis with conceptual labels based on Kingdon’s theory of agenda setting. The study found that in relation to FAIR data the problem stream focuses on the complexity of data collected for research, while open data primarily emphasises giving the public access to non-confidential data. In the policy stream, the two concepts share common standpoints in terms of making data available and reusable, although different approaches are adopted in practice to accomplish these goals. In the politics stream, stakeholders with different objectives support FAIR data and from those who support open data.
文摘The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability,Accessibility,Interoperability,and Reuse of digital assets in the face of an exponential increase of data volume and complexity.The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data.Here,we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies.We discuss the data curation effort involved,the resulting output,and the business and scientific impact of our work.Finally,we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.
文摘The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel frame-work designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our frame-work are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.
基金Thomas Margoni co-coordinates the legal task force of OpenAIRE Advance,a project funded under the H2020 programme of the European Commission,project n°:777541.We would like to thank the support of the project.
文摘The last letter of the FAIR acronym stands for Reusability.Data and metadata should be made available with a clear and accessible usage license.But,what are the choices?How can researchers share data and allow reusability?Are all the licenses available for sharing content suitable for data?Data can be covered by different layers of copyright protection making the relationship between data and copyright particularly complex.Some research data can be considered as a work and therefore covered by full copyright while other data can be in the public domain due to their lack of originality.Moreover,a collection of data can be protected by special rights in Europe to acknowledge the investment in time and money in obtaining,presenting,arranging or verifying the data.The need of using a license when sharing data comes from the fact that,under current copyright laws,when rights exist,the absence of any legal notice must be understood as the default“all rights reserved”regime.Unless an exception applies,the authorisation of right holders is necessary for reuse.Right holders could use any text to state the reusability of data but it is advisable to use some of the existing licenses,and especially the ones that are suitable for data and databases.We hope that with this paper we can bring some clarity in relation to the rights involved when sharing research data.
文摘Data availability statements can provide useful information about how researchers actually share research data.We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019.We categorized the data availability statements,and looked at trends over time.We found expected increases in the number of data availability statements submitted over time,and marked increases that correlate with policy changes made by journals.Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.
基金supported by Czech Technical University in Prague grant No.SGS20/209/OHK3/3T/18.LOBSS,RK and KB are partially funded by funding from the Horizon2020 projects FAIRsFAIR grant No.831558.
文摘While the FAIR Principles do not specify a technical solution for'FAIRness',it was clear from the outset of the FAIR initiative that it would be useful to have commodity software and tooling that would simplify the creation of FAIR-compliant resources.The FAIR Data Point is a metadata repository that follows the DCAT(2)schema,and utilizes the Linked Data Platform to manage the hierarchical metadata layers as LDP Containers.There has been a recent flurry of development activity around the FAIR Data Point that has significantly improved its power and ease-of-use.Here we describe five specific tools—an installer,a loader,two Webbased interfaces,and an indexer-aimed at maximizing the uptake and utility of the FAIR Data Point.
基金supported by funding from NWO, domain Social Sciences and Humanities under the ‘Corona Fast-track Data’ call for proposals, file no. 440.20.012VODAN-Africa+3 种基金the Philips Foundationthe Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘This article describes the FAIRification process(which involves making data Findable, Accessible, Interoperable and Reusable—or FAIR—for both machines and humans) for data related to the impact of COVID-19 on migrants, refugees and asylum seekers in Tunisia, Libya and Niger, according to the scheme adopted by GO FAIR. This process was divided into three phases: pre-FAIRification, FAIRification and postFAIRification. Each phase consisted of seven steps. In the first phase, 118 in-depth interviews and 565 press articles and research reports were collected by students and researchers at the University of Sousse in Tunisia and researchers in Niger. These interviews, articles and reports constitute the dataset for this research. In the second phase, the data were sorted and converted into a machine actionable format and published on a FAIR Data Point hosted at the University of Sousse. In the third phase, an assessment of the implementation of the FAIR Guidelines was undertaken. Certain barriers and challenges were faced in this process and solutions were found. For FAIR data curation, certain changes need to be made to the technical process. People need to be convinced to make these changes and that the implementation of FAIR will generate a long-term return on investment. Although the implementation of FAIR Guidelines is not straightforward, making our resources FAIR is essential to achieving better science together.